On The Network You Keep: Analyzing Persons of Interest using Cliqster
Saber Shokat Fadaee, Mehrdad Farajtabar, Ravi Sundaram, Javed A. Aslam, Nikos Passas
NNoname manuscript No. (will be inserted by the editor)
On The Network You Keep: Analyzing Persons ofInterest using Cliqster
Saber Shokat Fadaee · MehrdadFarajtabar · Ravi Sundaram · Javed A.Aslam · Nikos Passas
Received: date / Accepted: date
Abstract
Our goal is to determine the structural differences between differ-ent categories of networks and to use these differences to predict the networkcategory. Existing work on this topic has looked at social networks such asFacebook, Twitter, co-author networks etc. We, instead, focus on a novel dataset that we have assembled from a variety of sources, including law-enforcementagencies, financial institutions, commercial database providers and other sim-ilar organizations. The data set comprises networks of persons of interest with each network belonging to different categories such as suspected terror-ists, convicted individuals etc. We demonstrate that such “anti-social” net-works are qualitatively different from the usual social networks and that new
A preliminary version of this paper appeared in Proceedings of the 2014 IEEE/ACM Inter-national Conference on Advances in Social Networks Analysis and Mining [1].S. Shokat FadaeeCollege of Computer and Information ScienceNortheastern UniversityE-mail: [email protected]. FarajtabarCollege of ComputingGeorgia Institute of TechnologyE-mail: [email protected]. SundaramCollege of Computer and Information ScienceNortheastern UniversityE-mail: [email protected]. A. AslamCollege of Computer and Information ScienceNortheastern UniversityE-mail: [email protected]. PassasSchool of Criminology and Criminal JusticeNortheastern UniversityE-mail: [email protected] a r X i v : . [ c s . S I] O c t Saber Shokat Fadaee et al. techniques are required to identify and learn features of such networks for thepurposes of prediction and classification.We propose Cliqster, a new generative Bernoulli process-based model forunweighted networks. The generating probabilities are the result of a decom-position which reflects a network’s community structure. Using a maximumlikelihood solution for the network inference leads to a least-squares problem.By solving this problem, we are able to present an efficient algorithm for trans-forming the network to a new space which is both concise and discriminative.This new space preserves the identity of the network as much as possible. Ouralgorithm is interpretable and intuitive. Finally, by comparing our researchagainst the baseline method (SVD) and against a state-of-the-art Graphletalgorithm, we show the strength of our algorithm in discriminating betweendifferent categories of networks.
Keywords
Social network analysis · Persons of interest · Communitystructure G , G , · · · , G n and another graph G m . We would like to find out which graph has the mostsimilar structure to G m , and whether G m can be used to reconstruct any ofthose graphs.Rather than studying individuals through popular social networks (suchas Twitter, Facebook, etc.), the presented research is based on a new data-setwhich has been collected through law-enforcement agencies, financial insti-tutions, commercial databases and other public resources. Our data-set is acollection of networks of persons of interest . This approach of building net-works from public resources has been successful because it is often easier toinfer the connections among individuals from widely available resources thanthrough the private activities of specific individuals. n The Network You Keep: Analyzing Persons of Interest using Cliqster 3 .
89% accuracy for the page rank and 40 .
61% accuracy for the graphdegree. This justifies the quest for new techniques to identify features in theunderlying structure of the networks that will enable accurate classification oftheir categories.1.3 Our contributionsAfter performing experiments with decomposition methods (and their vari-ants) from existing literature, we finally discovered a novel technique we callCliqster – based on decomposing the network into a linear combination ofits maximal cliques, similar to Graphlet decomposition [11] of a network. Wecompare Cliqster against the traditional SVD (Singular Value Decomposition)as well as state-of-the-art Graphlet methods. The most important yardstick ofcomparison is the discriminating power of the methods. We find that Cliqsteris superior to Graphlet and significantly superior to SVD in its discriminatingpower, i.e., in its ability to distinguish between different categories of personsof interest. Efficiency is another important criterion and comprises both thespeed of the inference algorithm as well as the size of the resulting representa-tion. Both the algorithm speed as well as the model size are closely tied to thedimension of the bases used in the representation. Again, here the dimensionof the Cliqster-bases was smaller than the Graphlet-bases in a majority of thecategories and substantially smaller than SVD in all the categories. A thirdcriterion is the interpretability of the model. By using cliques, Cliqster natu-rally captures interactions between groups or cells of individuals and is thususeful for detecting subversive sets of individuals with the potential to act inconcert.
Saber Shokat Fadaee et al.
In summary, we provide a new generative statistical model for networkswith an efficient inference algorithm. Cliqster is computationally efficient, andintuitive, and gives interpretable results. We have also created a new andcomprehensive data-set gathered from public and commercial records thathas independent value. Our findings validate the promise of statistics-basedtechnologies for categorizing and drawing inferences about sub-networks ofpeople entirely through the structure of their network.The remaining part of the paper is organized as follows. In §
2, we brieflyintroduce related work. § §
4, experimental results arepresented demonstrating the effectiveness of our algorithm on finding an ap-propriate and discriminating representation of a social network’s structure. Atthe end of this section, we present a comprehensive discussion of observationsregarding the dataset. § Significant attention has been given to to the approach of studying criminalactivity through an analysis of social networks [12], [13], and [14]. [12] discov-ered that two-thirds of criminals commit crimes alongside another person. [13]demonstrated that charting social interactions can facilitate an understandingof criminal activity. [14] investigated the importance of weak ties to interpretcriminal activity.Statistical network models have also been widely studied in order to demon-strate interactions among people in different contexts. Such network modelshave been used to analyze social relationships, communication networks, pub-lishing activity, terrorist networks, and protein interaction patterns, as wellas many other huge data-sets. [15] considered random graphs with fixed num-ber of vertices and studied the properties of this model as the number ofedges increases. [16] studied a related version in which every edge had a fixedprobability p for appearing in a network. Exchangeable random graphs [17]and exponential random graphs [18] are other important models. In [19] theycreated a toolbox to resolve duplicate nodes in a social network.The problem of finding roles of a person in a network has been widelystudied. In [20] they have a link-based approach to this problem. In [21] theystudied how to identify a group of vertices that can mutually verify each other.The relationship between social roles and diffusion process in a social networkis studied in [22]. In [23] they combine the problem of capturing uncertaintyover existence of edges, uncertainty over attribute values of nodes and identityuncertainty. In [24] they use an unsupervised method to solve the problem ofdiscovering roles of a node in a network. In [25] they studied how the networkcharacteristic reflect the social situation of users in an online society. In [26]they study the role discovery problem with an assumption that nodes withsimilar structural patterns belong to the same role. The difference between n The Network You Keep: Analyzing Persons of Interest using Cliqster 5 the works of [24], [25], [26] and similar works like [27], [28], [29] with our workis that they are interested in the roles of a node in a specific network, while weare interested in studying the structural differences among different networks.In this work, we assume all the nodes in a network has the same role/job.Despite the various applications of finding the roles of different sub networksin a graph, this problem has only received a limited amount of attention. Inthis paper we are studying the role discovery problem for a network.Recently researchers have become interested in stochastic block-modelingand latent graph models [30,31,32]. These methods attempt to analyze thelatent community structure behind interactions. Instead of modeling the com-munity structure of the network directly, we propose a simple stochastic pro-cess based on a Bernoulli trial for generating networks. We implicitly considerthe community structure in the network generating model through a decom-position and projection to the space of baseline communities (cliques in ourmodel). For a comprehensive review of statistical network models we referinterested readers to [33].Formerly, Singular Value Decomposition was used for the decompositionof a network [34,35,36]. However, since SVD basis elements are not inter-pretable in terms of community structure, it can not capture the notion ofsocial information we are interested in quantifying. Authors in [11] introducedthe Graphlet decomposition of a weighted network; by abandoning the or-thogonality constraint they were able to gain interpretability. The resultingmethod works with weighted graphs; however, alternate techniques, such aspower graphs (which involve powering the adjacency matrix of a graph toobtain a weighted graph), need to be used in order to apply this method tounweighted graphs such as (most) social networks. n nodes in the network (For example n = 10 in Figure1). Consider Y as a n × n matrix representing the connectivity in the network. Y ( r, s ) = 1 if node r is connected to node s , and 0 otherwise. Saber Shokat Fadaee et al.
Fig. 1: Network of ten peopleIn Cliqster, the generative model for the network is: Y = Bernoulli( Z ) (1)which means Y ( r, s ) = Y ( s, r ) = 1 with probability Z ( r, s ), and Y ( r, s ) = Y ( s, r ) = 0 with probability 1 − Z ( r, s ) for all r > s . Since the graph isundirected the matrix Z is lower triangular.Inspired by PCA and SVD, in Cliqster we choose to represent Z in a newspace [34], [36]. Community structure is a key factor to understand and analyzea network, and because of this we are motivated to choose bases in a way thatreflects the community structure [35]. Consequently, we decided to factorize Z as Z = K (cid:88) k =1 µ k B k (2)where K is the number of maximal cliques (bases), and B k is k th lower tri-angular basis matrix that represents the k th maximal clique, and µ k is itscontribution to the network. In section 3.4 we elaborate on this basis selec-tion process. From this point forward, we consider these bases as cliques ofa network. We also represent a network in this new space. Each network isparameterized by the coefficients and the bases which construct the Z , thenetwork’s generating matrix.3.2 InferenceWhen given a network Y of people and their connections, our goal is to inferthe parameters generating this network. We must first assume the bases are n The Network You Keep: Analyzing Persons of Interest using Cliqster 7 selected as baseline cliques. The likelihood of the network parameters (coeffi-cients) given the observation is: L ( µ K ) = (cid:89) r>s : Y ( r,s )=1 Z ( r, s ) (cid:89) r>s : Y ( r,s )=0 (1 − Z ( r, s ))We estimate these parameters by maximizing their likelihood under the con-straint 0 ≤ Z ( r, s ) ≤ r > s .One can easily see the likelihood is maximized when Z ( r, s ) = 1 if Y ( r, s ) =1 and Z ( r, s ) = 0 if Y ( r, s ) = 0. Therefore Y = K (cid:88) k =1 µ k B k (3)should be used for the lower triangle of Y .Unfolding the above equation results in, Y (2 ,
1) = µ B (2 ,
1) + . . . + µ K B K (2 , Y (3 ,
1) = µ B (3 ,
1) + . . . + µ K B K (3 , Y (3 ,
2) = µ B (3 ,
2) + . . . + µ K B K (3 , Y ( n, n −
1) = µ B ( n, n −
1) + . . . + µ K B K ( n, n − µ = ( µ , . . . , µ K ) (cid:62) b rs = ( B ( r, s ) , . . . , B K ( r, s )) (cid:62) (4)So the new objective function can be written as, J = (cid:88) r>s ( µ (cid:62) b rs − Y ( r, s )) (5) J is convex with respect to µ under the following constraints 0 ≤ µ (cid:62) b rs ≤
1. This is essentially a constrained least squares problem, which can be solvedthrough existing efficient algorithms [37], [38]. Through this formula, the rep-resentation parameters µ K are thus computed easily and we are done withthe inference procedure.We turn our attention to the new representation and try to find an algo-rithm which can produce a more interpretable result. The exact generatingparameters are no longer needed in our application. Therefore, by relaxingthe constraints we will be able to present it with a simple and very efficientalgorithm. In addition, the solution to this unconstrained problem providesus with an intuitive understanding of what is happening behind this inference Saber Shokat Fadaee et al. procedure. To determine the optimal parameters, we must take the derivativewith respect to µ : ∂J∂µ = 2 (cid:88) r>s b rs ( b rs (cid:62) µ − Y ( r, s )) (6)By equating the above derivative to zero and doing a simple mathematicalprocedure, we are presented with the solution µ = A − d (7)where A = (cid:88) r>s b rs b rs (cid:62) d = (cid:88) r>s Y ( r, s ) b rs (8) A is a K × K matrix and d is a K × O ( n ) constraints. Despite this fact, weobtain very good results, and we will soon explain why this happens.Our novel decomposition method finds µ which is used to represent a net-work, and which could stand-in for a network in network analysis applications.This representation is used in the next section in order to discriminate betweendifferent types of networks.The results from the decomposition of the network presented in figure 1 isdemonstrated in table 1.Table 1: µ within each cluster Cluster members µ { , , } { , , } { , , } { , , } { , } { , } { , } A and d give you an intuition about n The Network You Keep: Analyzing Persons of Interest using Cliqster 9 the network. For further insight into this process, consider a matrix A . Ev-ery entry of this matrix is equal to the number of edges shared by the twocorresponding cliques. This matrix encodes the power relationships betweenbaseline clusters, as a part of network reconstruction. The intersection be-tween two bases shows how much one basis can overpower another basis asthey are reconstructing a network. In contrast, d presents the commonalitiesbetween a given network and its baseline communities. Through this equation,a community’s contribution to a network is encoded.With the interpretation of this data in mind, the equation A µ = d is nowmore meaningful for understanding the significance of our new representationof a network. Consider multiplying the first row of the matrix by the vector µ ,which should be equal to d . In order to solve this equation, we have chosenour coefficients in such a way that when the intersection of cluster 1 and otherclusters are multiplied by their corresponding coefficients and added together,the result is a clearer understanding of the first cluster’s contribution to thenetwork construction.3.4 Basis SelectionUsers in persons of interest network usually form associations in particularways, thus, community structure is a good distinguishing factor for differentnetworks. There are different structures that form a community. One of theinteresting structures that forms a community is the maximal cliques of thatcommunity. We use them as the basis of our method. There are so many waysto compute the maximal cliques of a network. We use the Bron-Kerboschalgorithm [39] for identifying our network’s communities. As mentioned in [11],this is one of the most efficient algorithms for identifying all of the maximalcliques in an undirected network. After applying the Bron-Kerbosch algorithmto figure 1, we identify the communities that are represented in table 2. TheBron-Kerbosch algorithm is described in the algorithm 1. Algorithm 1
Bron-Kerbosch algorithm C = ∅ (cid:46) We keep the maximal clique in C2: I = V ( G ) (cid:46) The set of vertices that can be added to C3: X = ∅ (cid:46) The set of vertices that are connected to C but are excluded from it4: procedure
Enumerate ( C, I, X )5: if I == ∅ and X == ∅ then C is maximal clique7: else for each vertex v in I do Enumerate ( C ∪ { v } , I (cid:84) N ( v ) , X (cid:84) N ( v ))10: I ← I { v } X ← X ∪ { v } The Bron-Kerbosch algorithm has many different versions. We use theversion introduced in [40].
One of the most successful aspects of this algorithm is that it provides amulti-resolution perspective of the network. This algorithm identifies commu-nities through a variety of scales, which, we will see, allows us to locate themost natural and representative set of coefficients and bases.3.5 ComplexityThe aforementioned inference equation requires A and d to be computed,which can be done in O ( m + n ) time where m is the number of edges and n isthe number of nodes in the network. The least-square solution requires O ( K )operations. A graph’s degeneracy measures its sparsity and is the smallestvalue f such that every nonempty induced subgraph of that graph containsa vertex of degree at most f [41]. In [40] they proposed a variation of theBron-Kerbosch algorithm, which runs in O ( f n f/ ) where f is a network’sdegeneracy number. This is close to the best possible running time since thelargest possible number of maximal cliques in an n-vertex graph with degen-eracy f is ( n − f )3 f/ [40].A power law graph is a graph in which the number of vertices with degree d is proportional to x α where 1 ≤ α ≤
3. When 1 < α ≤ f = O ( n / α ),and when 2 < α < f = O ( n (3 − α ) / ) [42]. Combining with therunning time, O ( f n f/ ) of the Bron-Kerbosch variant [40], we find that therunning time for finding all maximal cliques in a power law graph to be 2 O ( √ n ) .However, the maximum number of cliques in graphs based on real worldnetworks is typically O (log n ) [11]. In this section we investigate the properties of the new features we have learnedabout the network in question. Firstly, we introduce the new dataset we havebuilt. Our experiments attempt to prove two claims:1. the new representation is concise, and2. it can discriminate between different network typesWe will now compare our results with SVD decomposition and graphlet de-composition algorithms [11].4.1 DatasetWe have gathered a dataset by gathering and fusing information from a varietyof public and commercial sources. Our final dataset was comprised of around750,000 persons of interest with 3,000,000 connections among them. We thenfiltered this dataset to slightly less than 550,000 individuals who fell into oneof the following 5 categories: n The Network You Keep: Analyzing Persons of Interest using Cliqster 11 Suspicious Individuals : Persons who have appeared on sanctioned lists,been arrested or detained, but not been convicted of a crime.2.
Convicted Individuals : Persons who have been indicted, tried and convictedin a court of law.3.
Lawyers/Legal Professionals : Persons currently employed in a legal profes-sion.4.
Politically Exposed Persons : Elected officials, heads of parties, or personswho have held or currently hold political positions now or in the past.5.
Suspected Terrorists : Persons suspected of aiding, abetting or committingterrorist activities.This dataset is publicly available at [9].Table 2: Table of Categories and corresponding sizes plus number ofconnected components and density of each category
Category Members Components Density
Suspicious Individuals 316,990 77,811 0.0000180Convicted Individuals 165,411 35,517 0.0000427Lawyers/Legal Professionals 3,723 1,492 0.0006220Politically Exposed Persons 13,776 4,947 0.0001533Suspected Terrorists 31,817 5,016 0.0002068
The color scheme we use for our figures are as follow: Red for
Suspi-cious Individuals (SI) , blue for
Convicted Individuals (CI) , brown for
Lawyer/Legal Professionals (LL) , orange for
Politically Exposed Per-sons (PEPS) , and black for
Suspected Terrorists (ST) .4.2 Basic propertiesWe want to know whether our dataset has the common properties of socialnetworks or not, i.e. having a power law distribution. The first thing to check isthe degree distribution of each subnetwork, and if they can be fitted to a power-law distribution. We have a scale-free network If the degree distributions inour subnetwork follow power-law distribution. We used the poweRlaw [43] andigraph [44] packages to calculate the maximum likelihood power law fit of theLegal subnetwork, and the results are shown in figure 2. It looks like a scale-freenetwork, but we need to check this with more accurate measures. In a power-law distribution P ( X = x ) is proportion to cx α . The α of each subnetwork canbe seen in the table 3. Each of our subnetwork can be fitted into a power-lawdistribution, so all of them are scale-free networks. However, these networksare not small-world networks. The number of connected components in eachnetwork, indicates if you start at a certain node in each network it is impossibleto reach to most of the other nodes in that network. . . . . Neighbors CD F Fig. 2: The cumulative distribution functions and their maximum likelihoodpower law fit of the Legal subnetworkTable 3: Table of alpha, the exponent of the fitted power-law distribution ineach category
Category α Suspicious Individuals 1.838563Convicted Individuals 1.733839Lawyers/Legal Professionals 2.977307Politically Exposed Persons 3.107326Suspected Terrorists 1.770715 ,
000 vertices asa sample. We then analyze this data, and repeat this operation 1 ,
000 times andrepresent the data’s average with bold lines in the following graphs. All figuresalso include a representation of what happens to this data when the standarddeviation of it is taken at a margin of 2 , which we illustrate through a line of alighter variation of the same color. We analyzed this data with three different n The Network You Keep: Analyzing Persons of Interest using Cliqster 13
Coefficient Index A m p li t ude o f C oe ff i c i en t - - - - - - - Convicted Individuals
Fig. 3: Number of bases and amplitude of coefficient for ConvictedIndividuals using SVDNumber of bases and amplitude of coefficient forConvicted Individuals using SVDmethods, the Singular Value Decomposition, Graphlet Decomposition, as wellas our own proposed model.4.4 Singular Value DecompositionWe first analyzed our data using the Singular Value Decomposition method[34]. Figure 3 shows the effective number of non-zero coefficients for this al-gorithm. Figure 4 demonstrates the ability of this algorithm to discriminatebetween two different categories. Finally, the ability of the algorithm to distin-guish between the 5 categories is illustrated in figure 5. The average numberof bases we observed in the samples of a 1 ,
000 vertices is around 800 as canbe seen in figures 3, 4 and 5.4.5 Graphlet DecompositionWe next performed the same tests using Graphlet Decomposition. Figure 6demonstrates the effective number of non-zero coefficients for this algorithm.Figure 7 shows the ability of this algorithm to discriminate between two dif-ferent types of networks. The algorithm’s ability to distinguish between the 5categories is again illustrated in figure 8. As can be seen in these figures thenumber of bases elements for Graphlet Decomposition is around 20.
STLL
Terrorism vs Legal
Coefficient Index A m p li t ude o f C oe ff i c i en t - - - - - - - Fig. 4: Comparison of coefficients between Terrorist sub networks and Legalsub networks using SVD
SICIPEPSSTLL
ALL
Coefficient Index A m p li t ude o f C oe ff i c i en t - - - - - - - Fig. 5: The ability of SVD method to distinguish between different categoriesof networks4.6 CliqsterFinally, we performed the same tests using our method. We first determinedappropriate bases using the Bron-Kerbosch algorithm. We then computed A and d . The new representation for a sample network of one category that n The Network You Keep: Analyzing Persons of Interest using Cliqster 15 Coefficient Index A m p li t ude o f C oe ff i c i en t . . . Convicted Individuals
Fig. 6: Number of bases and amplitude of coefficient for ConvictedIndividuals using Graphlet Decomposition Algorithmresulted from our new method is shown in Figure 9. Figure 10 shows the abilityof our algorithm to discriminate between two different types of networks. Ournew algorithm’s ability to distinguish between two different types of networksis illustrated in Figure 11, which also shows that the number of bases elementsfor Graphlet Decomposition is around 50.4.7 PerformanceWe analyzed the time complexity of Cliqster in the section 3.5. Now it’s timeto check if the empirical results verify our theory. For the
Convicted Individuals subnetwork we ran both our method and SVD using the igraph package in R.The performance of the Graphlet method is very similar to Cliqster so we donot include that in this experiment.We ran our experiment on “
Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz(8 CPUs), 3.4GHz ” processor with “ ” of memory. As you can seein figure 12, as we grow the sample size our method performs twice as fast asthe SVD method.
STLL
Terrorism vs Legal
Coefficient Index A m p li t ude o f C oe ff i c i en t . . . Fig. 7: Comparison of coefficients between Terrorist sub networks and Legalsub networks using Graphlet Decomposition Algorithm4.8 DistinguishabilityIn order to compare the ability of each of these methods to distinguish be-tween different types of social networks, we sampled 100 networks from eachcategory, combining all of these samples before running the K-means cluster-ing algorithm (with 5 as the number of clusters), and repeated this action 100times. We used each network’s top 20 largest coefficients, and are willing toknow if coefficients of different sub-networks can be distinguished from eachother. We gave the combined coefficients of all different sub-networks to theK-means clustering algorithm as an input, and calculated the mean error ofclustering. As you can see in table 4, our method often returns the bases withthe best ability to distinguish between the type of social network presented.The Graphlet Decompostion slightly outperforms our method in two of thefollowing sub-networks, and such difference is negligible in practice. n The Network You Keep: Analyzing Persons of Interest using Cliqster 17
SICI
ALL
Coefficient Index A m p li t ude o f C oe ff i c i en t . . . . PEPSSTLL
Fig. 8: The ability of Graphlet method to distinguish between differentcategories of networksTable 4: Mean error of clustering with 20 coefficients ( µ ) Category SVD Graphlet Cliqster
SI 0.51461
LL 0.75006 0.10931
PEPS 0.66082 0.12195
ST 0.65381 k − nearest neighborsalgorithm (or k − N N for short). k − N N is a non-parametric method that isused for classification in a supervised setting. Let’s assume we want to comparethe features that are used to distinguish between these two groups: Suspicious
Convicted Individual
Coefficient Index A m p li t ude o f C oe ff i c i en t . . . . . . Fig. 9: Number of bases and amplitude of coefficient for ConvictedIndividuals using Cliqster
STLL
Terrorism vs Legal
Coefficient Index A m p li t ude o f C oe ff i c i en t . . . . . . . Fig. 10: Comparison of coefficients between Terrorist sub networks and Legalsub networks using Cliqster n The Network You Keep: Analyzing Persons of Interest using Cliqster 19
Fig. 11: The ability of Cliqster to distinguish between different categories ofnetworks
Performance
Sample Size E l ap s ed t i m e i n s e c ond s CliqsterSVD
Fig. 12: Comparison of performance between Cliqster and SVD
Suspicious Individuals versus Convicted Individuals
Size of training set A cc u r a cy Fig. 13: The accuracy of community detection based on the training sizeIndividuals and Convicted Individuals. We train Cliqster with samples of size1 ,
000 that are randomly selected from both communities, gather the featuresand repeat this operation 1 ,
000 times. After that we run the k − N N with k = 3 and a test data of size 100. In order to avoid ties, we need to pick anodd number for k in case of binary classification. When we set k = 3 we arelooking at the classification problem in a 3 dimensional space. We also makesure there is no intersection between the members of training and test sets toavoid the problem of over-fitting.Figure 13 shows the result of this experiment. With using a training set ofsize 40 we can classify these two groups with an accuracy of 97%. It basicallymeans that when we have a training set of size 40, K-NN can learn how todistinguish between these two groups with an accuracy of 97%.Things are a little bit different when it comes to comparing the behav-ior of Lawyers/Legal professionals network and Politically Exposed Personsnetwork. As you can see in figure 14 we need a training set of size 100 toreach to an accuracy of 74%. This difference suggest a contrast between thecharacteristics of these networks. According to Cliqster, the network structureof Lawyers/Legal professionals and the network structure of Politically Ex-posed Persons have more in common than the network structure of SuspiciousIndividuals and the network structure of Convicted Individuals.If we analyze the network structure of Suspected Terrorists and compare itwith network structure of Convicted Individuals, we will see that after usinga training set of size around 20 we reach to the 100% accuracy. k − N N n The Network You Keep: Analyzing Persons of Interest using Cliqster 21
Lawyer/Legal professionals versus PEPS
Size of training set A cc u r a cy Fig. 14: The accuracy of community detection based on the training sizecan classify these two groups with no error 15. Now we compare the networkstructure of Suspected Terrorists and Politically Exposed Persons networks16. After using a training set of size 50, we reach to the 99% accuracy.4.10 DiscussionFigures 3, 6, and 9 compare the ability of the three methods to compress data.These graphs demonstrate that the SVD method is inefficient for summarizinga network’s features. The graph also shows that the Graphlet method producesthe smallest feature space. Our representation is also very small, however,and the difference in size produced through these methods is negligible inreal world applications of this equation. Earlier we demonstrated that the20 largest coefficients in the representation produced through our method issufficient to outperform the Graphlet algorithm in terms of distinguish abilityand clustering.Figures 4, 7, and 10 demonstrate the ability of the algorithms to distinguishbetween two selected categories. When comparing our method with the SVDand Graphic Decomposition methods, the coefficients seem to be very similarbetween those produced by our method and the SVD method, however, ourmethod also performs as well as the Graphlet Decomposition method in distin-guishing between two types of networks. This demonstrates that communitystructure is a natural basis for interpreting social networks. By decomposinga network into cliques, our method provides an efficient transformation that is
Suspected Terrorists versus Convicted Individuals
Size of training set A cc u r a cy Fig. 15: The accuracy of community detection based on the training size
Suspected Terrorists versus PEPS
Size of training set A cc u r a cy Fig. 16: The accuracy of community detection based on the training size n The Network You Keep: Analyzing Persons of Interest using Cliqster 23 concise and easier to analyze than SVD bases, which are constrained throughtheir requirement to be orthogonal. Figures 5, 8, and 11 verify these claimsfor all 5 categories.Table 4 demonstrates the performance of our algorithm to consistentlysummarize each network according to category. We then clustered all coef-ficients using k-means. Through this process, it became clear that the SVDmethod could not identity the category of the network being analyzed. Be-cause of this, we can infer that by selecting the community structure (cliques)as bases, our ability to identify a network is considerably improved. Our pro-posed algorithm was more accurate in clustering than the Graphlet Decom-position algorithm. Thus, the Bernoulli Distribution (as used in seminal workof Erd˝os and R´enyi) is a simpler and more natural process for generating net-works. Our proposed method is also easier to interpret and does not run therisk of getting stuck in local minima like the Graphlet method.Finally, figures 13, 14, 16 and 15 demonstrate the ability of k − N N toclassify features produced by Cliqster in binary classification settings. Theyalso give us some interpretations on similarities and differences between thenetwork structure of different groups.
After proposing Cliqster, which is a new generative model for decomposingrandom networks, we applied this method to our new dataset of persons ofinterest. Our primary discovery in this research has been that a variant of ourdecomposition method provides a statistical test capable of accurately discrim-inating between different categories of social networks. Our resulting methodis both accurate and efficient. We created a similar discriminant based on thetraditional Singular Value Decomposition and Graphlet methods, and foundthat they are not capable of discriminating between social network categories.Our research also demonstrates community structure or cliques to be a naturalchoice for bases. This allows for a high degree of compression and at the sametime preserves the identity of the network very well. The new representationproduced through our method is concise and discriminative.Comparing the three methods, we found that the dimensions of the Graphlet-bases and our bases were significantly smaller than the SVD-bases, while alsoaccurately identifying the category of the network being analyzed. Therefore,our method is an extremely accurate and efficient means of identifying differentnetwork types.On the non-technical side we would like to see how we can get law-enforcementagencies to adopt our methods. There are a number of directions for furtherresearch on the technical front. We would like to expand the use of our sim-ple intuitive algorithm to weighted networks, such as networks with an edgegenerating process based on the Gamma distribution. The problem with theMaximum Likelihood solution for a network is that it is subject to over-fittingor a biased estimation. Adding a regularization term would adjust for this dis- crepancy. A natural choice for such a term would be a sparse regularization,which is in accordance with real social networks. Extensive possibility for fu-ture work exists in the potential of incorporating prior knowledge into Cliqsterby using Bayesian inference. Another natural avenue for further investigationsis to consider how Cliqster can be adapted to regular social networks.
Acknowledgment
The authors would like to thank Hossein Azari Soufiani for his comments ondifferent aspects of this work.
References
1. S. Shokat Fadaee, M. Farajtabar, R. Sundaram, J. Aslam, and N. Passas, “The networkyou keep: Analyzing persons of interest using cliqster,” in
Advances in Social NetworksAnalysis and Mining (ASONAM), 2014 IEEE/ACM International Conference on
Networks, Crowds, and Markets: Reasoning About a Highly Con-nected World . New York, NY, USA: Cambridge University Press, 2010.11. H. Azari Soufiani and E. M. Airoldi, “Graphlets decomposition of a weighted network,”
Journal of Machine Learning Research , 2012.12. A. Reiss, “Understanding changes in crime rates,”
In Crime and Justice: A review ofResearch , vol. 10, 1980.13. E. L. Glaeser, B. Sacerdote, and J. A. Scheinkman, “Crime and social interactions,”
The Quarterly Journal of Economics , vol. 111, no. 2, pp. 507–48, May 1996.14. E. Patacchini and Y. Zenou, “The strength of weak ties in crime,”
European EconomicReview , vol. 52, no. 2, pp. 209 – 236, 2008.15. P. Erd˝os and A. R´enyi, “On random graphs,”
Publicationes Mathematicae Debrecen ,vol. 6, pp. 290–297, 1959.16. E. N. Gilbert, “Random graphs,”
The Annals of Mathematical Statistics , vol. 30, no. 4,pp. 1141–1144, 1959.17. E. M. Airoldi, “Bayesian mixed-membership models of complex and evolving networks,”DTIC Document, Tech. Rep., 2006.18. G. Robins, P. Pattison, Y. Kalish, and D. Lusher, “An introduction to exponentialrandom graph (p*) models for social networks,”
Social Networks , vol. 29, no. 2, pp. 173– 191, 2007, special Section: Advances in Exponential Random Graph (p*) Models.19. M. Bilgic, L. Licamele, L. Getoor, and B. Shneiderman, “D-dupe: An interactive toolfor entity resolution in social networks,” in
Visual Analytics Science and Technology(VAST) , Baltimore, October 2006.n The Network You Keep: Analyzing Persons of Interest using Cliqster 2520. G. Barta, “A link-based approach to entity resolution in social networks,”
CoRR , vol.abs/1404.3017, 2014.21. Y.-C. Lo, J.-Y. Li, M.-Y. Yeh, S.-D. Lin, and J. Pei, “What distinguish one from itspeers in social networks?”
Data Mining and Knowledge Discovery , vol. 27, no. 3, pp.396–420, 2013.22. Y. Yang, J. Tang, C. W.-k. Leung, Y. Sun, Q. Chen, J. Li, and Q. Yang, “Rain: Socialrole-aware information diffusion,” 2014.23. W. E. Moustafa, A. Kimmig, A. Deshpande, and L. Getoor, “Subgraph pattern matchingover uncertain graphs with identity linkage uncertainty,”
CoRR , vol. abs/1305.7006,2013.24. K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra,C. Faloutsos, and L. Li, “Rolx: Structural role extraction & mining in large graphs,” in
Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discov-ery and Data Mining , ser. KDD ’12. New York, NY, USA: ACM, 2012, pp. 1231–1239.25. Y. Zhao, G. Wang, P. S. Yu, S. Liu, and S. Zhang, “Inferring social roles and statuses insocial networks,” in
Proceedings of the 19th ACM SIGKDD International Conferenceon Knowledge Discovery and Data Mining , ser. KDD ’13. New York, NY, USA: ACM,2013, pp. 695–703.26. R. A. Rossi and N. K. Ahmed, “Role discovery in networks,”
IEEE Transactions onKnowledge and Data Engineering , vol. 99, no. PrePrints, p. 1, 2014.27. K. Li, S. Guo, N. Du, J. Gao, and A. Zhang, “Learning, analyzing and predicting objectroles on dynamic networks,” in
Data Mining (ICDM), 2013 IEEE 13th InternationalConference on , Dec 2013, pp. 428–437.28. S. Bhagat, G. Cormode, and S. Muthukrishnan, “Node classification in social networks,”
CoRR , vol. abs/1101.3291, 2011.29. H. Xu, Y. Yang, L. Wang, and W. Liu, “Node classification in social network via a factorgraph model,” in
Advances in Knowledge Discovery and Data Mining , ser. LectureNotes in Computer Science, J. Pei, V. Tseng, L. Cao, H. Motoda, and G. Xu, Eds.Springer Berlin Heidelberg, 2013, vol. 7818, pp. 213–224.30. K. Nowicki and T. A. B. Snijders, “Estimation and prediction for stochastic blockstruc-tures,”
Journal of the American Statistical Association , vol. 96, no. 455, pp. 1077–1087,2001.31. E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, “Mixed membership stochasticblockmodels,”
Journal of Machine Learning Research , 2008.32. B. Karrer and M. E. Newman, “Stochastic blockmodels and community structure innetworks,”
Physical Review E , vol. 83, no. 1, p. 016107, 2011.33. A. Goldenberg, A. X. Zheng, S. E. Fienberg, and E. M. Airoldi, “A survey of statisticalnetwork models,”
ArXiv e-prints , dec 2009.34. F. R. K. Chung, “Spectral graph theory,”
American Mathematical Society , 1997.35. P. Hoff, “Multiplicative latent factor models for description and prediction of socialnetworks,”
Computational & Mathematical Organization Theory , vol. 15, no. 4, pp.261–272, 2009.36. M. Kim and J. Leskovec, “Multiplicative attribute graph model of real-world networks.”
Internet Mathematics , vol. 8, no. 1-2, pp. 113–160, 2012.37. C. L. Lawson and R. J. Hanson,
Solving least squares problems . SIAM, 1974, vol. 161.38. S. P. Boyd and L. Vandenberghe,
Convex optimization . Cambridge university press,2004.39. C. Bron and J. Kerbosch, “Finding all cliques of an undirected graph,”
Communicationsof the ACM , 1973.40. D. Eppstein and D. Strash, “Listing all maximal cliques in large sparse real-worldgraphs,” in
Experimental Algorithms . Springer, 2011, pp. 364–375.41. D. R. Lick and A. T. White, “ k -degenerate graphs,” Canad. J. Math. , vol. 22, pp.1082–1096, 1970.42. A. Buchanan, J. Walteros, S. Butenko, and P. Pardalos, “Solving maximum clique insparse graphs: an o ( nm + n d/ ) algorithm for d-degenerate graphs,” OptimizationLetters , 2013.43. C. S. Gillespie, “Fitting heavy tailed distributions: The poweRlaw package,”
Journal ofStatistical Software , vol. 64, no. 2, pp. 1–16, 2015.6 Saber Shokat Fadaee et al.44. G. Csardi and T. Nepusz, “The igraph software package for complex networkresearch,”