Hide and Seek: Outwitting Community Detection Algorithms
11 Hide and Seek: Outwitting CommunityDetection Algorithms
Shravika Mittal (cid:63) , Debarka Sengupta (cid:63) , † , Tanmoy Chakraborty (cid:63)(cid:63) Dept. of CSE, † Dept. of Comp. Biology, IIIT-Delhi, India { shravika16093, debarka, tanmoy } @iiitd.ac.in Abstract —Community affiliation of a node plays an importantrole in determining its contextual position in the network, whichmay raise privacy concerns when a sensitive node wants tohide its identity in a network. Oftentimes, a target communityseeks to protect itself from adversaries so that its constituentmembers remain hidden inside the network. The current studyfocuses on hiding such sensitive communities so that communityaffiliation of the targeted nodes can be concealed. This leadsto the problem of community deception which investigates theavenues of minimally rewiring nodes in a network so that a giventarget community maximally hides from a community detectionalgorithm. We formalize the problem of community deceptionand introduce NEURAL, a novel method that greedily optimizes a node-centric objective function to determine the rewiring strategy.Theoretical settings pose a restriction on the number of strategiesthat can be employed to optimize the objective function, whichin turn reduces the overhead of choosing the best strategy frommultiple options. We also show that our objective function is submodular and monotone . When tested on both synthetic and 7real-world networks, NEURAL is able to deceive 6 widely usedcommunity detection algorithms. We benchmark its performancewith respect to 4 state-of-the-art methods on 4 evaluation metrics.Additionally, our qualitative analysis on 3 other attributed real-world networks reveals that NEURAL, quite strikingly, capturesimportant meta-information about edges that otherwise could notbe inferred by observing only their topological structures.
Index Terms —Community detection, community hiding, per-manence, complex networks
I. I
NTRODUCTION D ETECTING communities from large networks has re-mained as one of the major research problems in the lasttwo decades. Different heuristics, metrics, and optimizationtechniques have been proposed to detect communities frommultiple types of networks [1]. However, of late, limited effortshave been visible to understand how easily a communitydetection algorithm can be deceived by minimal rewiring ofnodes.In this paper, we ask a fundamental question:
How dowe hide a target community from being exposed to acommunity detection algorithm, assuming limited rewiringoperations are allowed?
In other words, can a node or acommunity disguise its positioning in the network in orderto escape detection [2]? We call this problem
Hide andSeek Community (HSC) . Answering this question matterssince it helps the social network users in hiding their identityfrom online surveillance [4]. It also helps law-enforcement Mislove et al. [3] showed how by breaking down Facebook user networkand attributes of certain users, it is possible to gather private data about otherFacebook users. Fig. 1. Flow diagram showing the procedure of NEURAL. organizations identify criminal acts deceiving online identity[5]. This may also be useful for counter-terrorism units inorder to deploy spies into a terrorist network. The solution ofthe current problem would help the spies determine who theyshould start a new friendship with (edge addition) or whichexisting friendship they should try to break (edge deletion) toconceal their community identity. However, one may arguethat the same method can be misused by the adversaries.Nonetheless, we believe that our investigation brings issuesto light for the plan of novel community detection methodsvigorous to deception strategies.To date, the fundamental question stated above has got verylittle attention as most of the focus has been concentratedtowards building efficient algorithms for community detection.Nagaraja [6] made a pioneering attempt to examine the degreeof network information required by an attacker to infer thecommunity membership information. Recently, Waniek et al.[2] proposed a heuristic-based solution to evade networkcentrality analysis. Fionda and Pirr`o [5] proposed a novelmetric and greedily optimized it to hide the members ofa target community from being detected by the communitydetection algorithms. Liu et al. [7] proposed an approach tomaximally hide the entire community structure (as opposed toa target community) with a minimum rewiring of the networkstructure. a r X i v : . [ c s . S I] F e b Here, we pose the HSC problem as a constrained optimiza-tion problem. The objective function is designed based on
Per-manence [8], a node-centric metric we proposed previously,which has been proved to be highly effective in detecting theentire community structure of a network. Permanence, being alocal metric, uses limited information of a node to determineits community membership. We theoretically prove that onlytwo types of edge update operations (inter-community edgeaddition and intra-community edge deletion) are useful forrewiring nodes to optimize our proposed objective function.We further show that the objective function is submodular and monotone w.r.t. the required edge updates. Therefore,we propose
NEURAL ( Ne twork deception u sing pe r m a nence l oss), a greedy optimization algorithm to optimize the objec-tive function. Given a network G , its community structure CS obtained from a community detection algorithm CDA , and atarget community C whose constituent nodes V C need to beconcealed, NEURAL rewires nodes within the rewiring budget β in such a way that CDA is unable to identify the originalcommunity affiliation of V C (Fig. 1 shows a flow diagram).Extensive experiments are conducted on both synthetic and7 real-world networks. Six widely used community detectionalgorithms are considered for deception. We compare NEU-RAL with 4 state-of-the-art community deception methods.The performance is measured based on 4 evaluation metrics(two of them are proposed by us). Our quantitative analysisshows that NEURAL significantly outperforms others acrossall the datasets and all the evaluation metrics.We further conduct a detailed qualitative analysis to explainthe physical significance of edges selected by the deceptionmethods on three attributed real-world networks – citationnetwork, terrorist network, and breast cancer network. Surpris-ingly, we observe that NEURAL is able to capture importantmeta-information of edges that otherwise could not be inferredjust by observing the topological structure of networks.In short, our major contributions are four-fold: • Novel objective function:
Our proposed objective func-tion is novel which considers minimum information ofnodes for network rewiring. • Novel algorithm:
We propose NEURAL, a novel greedyoptimization algorithm for community deception. • Quantitative evaluation:
We perform an extensive eval-uation on multiple datasets and show that NEURALoutsmarts existing approaches for hiding the target com-munity within the specific budget. • Qualitative evaluation:
We further interpret the edgesselected for node rewiring by the deception methods andshow that NEURAL captures important meta-informationof edges in three real-world networks.
Reproducibility:
The codes and datasets are available at:https://github.com/mittalshravika/HideAndSeek-NEURAL.II. R
ELATED WORK
Community Detection:
There has been a plethora of re-search in the detection of communities from a given network.These include traditional clustering based algorithms suchas hierarchical clustering, partitional clustering and spectral
TABLE IC
OMPARISON OF
NEURAL
WITH EXISTING METHODS .A BBREVIATIONS : E + , − : EDGE ADDITION AND DELETION , NC:
NODECENTRALITY , E → C : EDGES CONNECTED WITH THE NODES IN THETARGET COMMUNITY C , QA: QUALITATIVE ANALYSIS .Method Metric Strategy Knowledge QANagaraja [6] Modularity E + NC NoDICE [2] Modularity E + , − E → C NoSADDEN [5] Safeness E + , − E → C No NEURAL
Permanence E + , − E → C Yes clustering, which group nodes together based on a similaritymetric [9]. Another class of community detection algorithmsrevolves around the optimization of metrics that define thequality of a network partition, such as modularity [10], [11],conductance [12], cut-ratio [13], etc. Few other methods arebased on random walks [14], information theory [15], [16],and spectral algorithms [17], [18]. Algorithms that detectoverlapping communities have also been proposed [19], [20].A detailed study of community detection algorithms can befound in [21], [22].
Community Deception:
Another area of interest that hasstarted revolving very recently is community deception i.e.,hiding a target community or the entire community structurefrom getting exposed to community detection algorithms.Nagaraja [6] proposed a counter detection method for hidinga community by adding edges under a certain budget. Theendpoints of edges to be added are chosen using vertexcentrality measures (degree centrality, eigenvector centrality,and random initialization). Waniek et al. [2] proposed DICE,an algorithm that deletes intra-community edges ( disconnectinternal ) and adds inter-community edges ( connect external ),inspired by the functioning of modularity. The authors alsodevised a metric to quantify the concealment of a targetcommunity in the network. Fionda and Pirr`o [5] referred tothe problem of hiding a community as community decep-tion . They devised a greedy optimization algorithm (dubbedSADDEN henceforth) to hide a target community based on safeness gain , a new metric that they proposed to quantifyhow safe a node is under adversarial attack. SADDEN requiresthe knowledge of the local community rather than knowingthe entire community structure of the network to deceivecommunity detection algorithms. Along with this, the authorsproposed a metric, called deception score to quantify the effectof the community deception algorithm on the network. Theyalso showed that their method outperforms modularity-basedapproaches. Recently, Liu et al. [7] extended the problem ofhiding a target community to hiding the entire communitystructure . They proposed an algorithm for community structuredeception based on information theory using network entropyminimization.We consider all the methods mentioned above (Nagaraja,DICE, SADDEN) as baselines along with a random edgerewiring method, except Liu et al. [7] as this method focuseson the deception of the entire community structure (instead ofa single target community ); moreover, the metric used in their To our knowledge, these are the only existing methods which attemptedto solve the HSC problem. method (community-based structural entropy) requires entirecommunity information.
How NEURAL is different from others?
Table II sum-marizes how NEURAL is different from the existing methodsfor community deception. NEURAL uses Permanence as ametric to determine how to update a given network efficientlyin order to hide the target community. We perform compre-hensive evaluation on both synthetic and real-world networksusing four different evaluation metrics. We further perform aqualitative analysis on 3 attributed networks to understand thesignificance of the selected edges.III. P
ROBLEM F ORMULATION
A. Preliminaries
A network G = ( V, E ) is defined as an undirected graphwith V as the set of vertices and E as the set of edges. Afterapplying a community detection algorithm on G , we get CS =( C , C , ...C k ) as the community structure. We only considercommunities that are non-overlapping. For community C ∈ CS , an intra-community edge (cid:104) u, v (cid:105) is defined such that u, v ∈ C , and an inter-community edge (cid:104) u, v (cid:105) is defined such that u ∈ C and v ∈ C (cid:48) where C ∩ C (cid:48) = φ . E intra ( C ) ( resp. E inter ( C ) ) denotes the set of intra- ( resp. inter-) communityedges corresponding to C . B. Hide and Seek Community (HSC)
Our primary goal is to come up with an algorithm that, withminimum edge rewiring, is able to hide a given target commu-nity C from a community detection method. In other words,the actual community membership information of nodes inside C should not be revealed by the community detection method.This is done by rearranging the structure of the network usinga certain number ( β ) of edge updates (which we call budget for network rewiring). We also assume that each edge updateoperation will incur a unit cost. One approach would be tosearch through the entire space for possible edge updatesexhaustively and select the ones that are able to hide the targetcommunity C the most. However, searching through this hugespace of all the possible combinations of edge updates wouldbecome computationally expensive in case of large networks.Along with this, such an exhaustive technique would requirethe knowledge of the entire network and may also depend onthe type of community detection algorithm that we intend tofool.To avoid this, we introduce the problem, called Hide andSeek Community to camouflage a target community C from acommunity detection method. Definition 3.1: (Hide and Seek Community)
For a network G = ( V, E ) , the problem of Hide and Seek Community (HSC)is to hide a target community C with the help of network edgeupdates constrained by a parameter β . It can be posed as aconstrained optimization problem as follows: argmax E (cid:48) ( C ) F ( C, E ( C ) , β, E (cid:48) ( C )) (1)where, E ( C ) = E intra ( C ) ∪ E inter ( C ) , E (cid:48) ( C ) = ( E ( C ) ∪ E add ) \ E del , and E add ( resp. E del ) indicates the set of edges to be added ( resp. deleted) to hide C such that | E add | + | E del | ≤ β . IV. M ETHODOLOGY
We consider
Permanence [8], [23], a node-centric metric todesign the objective function F in (1). We theoretically showthat limited edge update operations are required to maximizethe Permanence loss (our objective function). We also showthat Permanence loss is submodular and monotone w.r.t.each of the edge update operations. Therefore, we proposeNEURAL, a greedy algorithm that makes use of Permanenceloss in order to hide a target community C . This section firstbriefly describes Permanence, followed by the greedy strategyused in NEURAL. A. Permanence
Chakraborty et al. proposed Permanence [8], [23], a vertex-centric metric that quantifies the containment of a node v ina network community C . The formulation of Permanence isbased on three factors - (i) the internal pull I ( v ) , denotedby the internal connections of a node v within its own com-munity, (ii) maximum external pull E max ( v ) , denoted by themaximum connections of v to its neighboring communities,and (iii) internal clustering coefficient of v , C in ( v ) , denotedby the fraction of actual and possible number of edges amongthe internal neighbors of v . The above three factors are thensuitably combined to obtain the Permanence of v as, P erm ( v, G ) = I ( v ) E max ( v ) × deg ( v ) − (1 − C in ( v )) (2)Fig. 2 shows a toy example to calculate the Permanence valueof a node.This metric indicates that a vertex would remain in its owncommunity as long as its internal pull is greater than theexternal pull or its internal neighbors are densely connected toeach other, hence forming a near clique. The Permanence fora network G is then defined as P erm ( G ) = (cid:80) v ∈ V P erm ( v ) | V | .The reasons behind choosing Permanence instead of othercommunity scoring metrics such as (local) modularity [24],[25], conductance, cut-ratio [1] are two-fold: (i) Permanenceis a vertex-centric local metric which would enable us toupdate edges incrementally in order to change the networkstructure without looking into the entire network structure, and(ii) Permanence has been shown to be superior to other localand global scoring metrics for community detection [22]. B. Proposed Objective Function: Permanence Loss
Our proposed community deception method NEURAL (dis-cussed in Section IV-D) aims to reduce Permanence of thenetwork for a target community C to be hidden from com-munity detection algorithms. We propose to do so becausereducing Permanence of a vertex would disrupt its containmentin the original community, changing the community structureof the network, making it difficult for detection algorithms toidentify the original communities. We search for edge updates Permanence can be also computed for an entire network.
Fig. 2. A toy example demonstrating the calculation of Permanence for anode, given the network and the community structure. (addition/deletion of edges) by maximizing the
Permanenceloss at every iteration, defined as, P l = P erm ( G ) − P erm ( G (cid:48) ) (3)where G represents the original network, and G (cid:48) represents themodified network after updating edges (see Fig. 1) w.r.t. thetarget community C , elaborated in Sections IV-C and IV-D.In Section IV-C, we will show that Permanence loss willbe affected (positively) only due to the (i) intra-communityedge deletion, and (ii) inter-community edge addition. Readersare encouraged to see supplementary where we show that Permanence loss is submodular and monotone w.r.t. eachof the edge updates stated above . C. Edge Updates
In this section, we describe four possible edge updateoperations to maximise Permanence loss P l in NEURAL —inter- and intra-community edge deletion, and inter- and intra-community edge addition.
1) Inter-community Edge Deletion:
Theorem 4.1:
Deleting an inter-community edge (cid:104) u, v (cid:105) where u ∈ C and v ∈ C (cid:48) such that C ∩ C (cid:48) = φ , does notresult in Permanence loss. Proof:
In this proof, we show that deleting an inter-community edge does not amount to Permanence loss. Aninter-community edge deletion just affects the Permanencemeasure for u and v . We will only show the change inPermanence for node u (same applies to v ). There can betwo cases:(i) E max ( u ) does not change after edge deletion: In thiscase, we assume that the maximum external connections fornode u remain the same after deleting (cid:104) u, v (cid:105) . Deleting (cid:104) u, v (cid:105) would not change C in ( u ) . It would only decrease its degreeby 1. Therefore, for Permanence loss, we need to see whether P l = P erm ( u, G ) − P erm ( u, G (cid:48) ) ≥ . This reduces to, P l = I ( u ) E max ( u ) × (cid:20) deg ( u ) − deg ( u ) − (cid:21) < Therefore, no Permanence loss is possible in this case.(ii) E max ( u ) changes after edge deletion: In this case, weassume that the deletion of edge (cid:104) u, v (cid:105) affects the maximumexternal connections of node u . This is the case where C (cid:48) is the only community that has the maximum external pull for node u . As a result, along with degree, E max ( u ) wouldalso decrease by 1. It would not change C in ( u ) . Therefore, forPermanence loss, we need to see whether P l = P erm ( u, G ) − P erm ( u, G (cid:48) ) ≥ . This reduces to, P l = I ( u ) (cid:20) E max ( u ) × deg ( u ) − E max ( u ) − × ( deg ( u ) − (cid:21) = I ( u ) (cid:20) − E max ( u ) − deg ( u ) E max ( u ) × deg ( u ) × ( E max ( u ) − × ( deg ( u ) − (cid:21) < ( E max ( u ) ≥ and deg ( u ) ≥ because of edge (cid:104) u, v (cid:105) ) Therefore, no Permanence loss is possible in the case ofdeleting an inter-community edge.
2) Intra-community Edge Deletion:
Theorem 4.2:
Deleting an intra-community edge (cid:104) u, v (cid:105) where u, v ∈ C , always results in Permanence loss. Proof:
Here we show that deleting an intra-communityedge always results in Permanence loss. We will only showthe change in Permanence for node u (same applies to v ).Such an edge update would decrease the internal degreeand degree of node u by 1. It would not affect E max ( u ) (no external connections are being changed). We narrow oursearch space such that, C in ( u ) decreases after the deletion of (cid:104) u, v (cid:105) . Therefore, for Permanence loss, we need to see whether P l = P erm ( u, G ) − P erm ( u, G (cid:48) ) ≥ . This reduces to, P l = 1 E max ( u ) (cid:20) I ( u ) deg ( u ) − I ( u ) − deg ( u ) − (cid:21) = 1 E max ( u ) (cid:20) deg ( u ) − I ( u ) deg ( u ) × ( deg ( u ) − (cid:21) ≥ (as deg ( u ) ≥ I ( u ) ) (4)Therefore, deleting an intra-community edge (cid:104) u, v (cid:105) wouldbring in Permanence loss in terms of nodes u and v .The intra-community edge deletion would also affect thePermanence measure for nodes that have both u and v astheir neighbors. If so, it would result in a change in theirinternal clustering coefficient value with all the other factorsunchanged. For Permanence loss due to such a node w , weneed to see whether P l = P erm ( w, G ) − P erm ( w, G (cid:48) ) ≥ .This reduces to, P l = (1 − C (cid:48) in ( w )) − (1 − C in ( w )) = C in ( w ) − C (cid:48) in ( w ) where C (cid:48) in ( w ) represents the updated internal clustering coef-ficient of w . In the above equation, P l > since C in ( w ) >C (cid:48) in ( w ) . For node w , the number of neighbors is intact, but theedges between its neighbors get reduced by 1 after (cid:104) u, v (cid:105) isdeleted. As a result, the internal clustering coefficient reduces,again resulting in Permanence loss. Therefore, deleting (cid:104) u, v (cid:105) would also bring in Permanence loss in terms of their commonneighbors.
3) Inter-community Edge Addition:
Theorem 4.3:
Adding an inter-community edge (cid:104) u, v (cid:105) where u ∈ C and v ∈ C (cid:48) , such that C ∩ C (cid:48) = φ , alwaysresults in Permanence loss. The loss is more if C (cid:48) is thecommunity that provides the maximum external pull fornode u . Proof:
In this proof, we show that adding an inter-community edge always causes Permanence loss. An inter-community edge addition just affects Permanence for nodes u and v . We will only show the change in Permanence for node u (same applies to v ). There can be two cases:(i) E max ( u ) does not change after edge addition: In thiscase, we assume that the maximum external connections fornode u remain the same after adding (cid:104) u, v (cid:105) . Adding (cid:104) u, v (cid:105) would have no effect on C in ( u ) . It would only increase itsdegree by . So, for Permanence loss, we need to see whether P l = P erm ( u, G ) − P erm ( u, G (cid:48) ) ≥ . This reduces to, P l = I ( u ) E max ( u ) × (cid:20) deg ( u ) − deg ( u ) + 1 (cid:21) > (5) Therefore, there is a Permanence loss in the case of addingan inter-community edge such that E max ( u ) does not changeafter edge addition.(ii) E max ( u ) changes after edge addition: In this case, weassume that the addition of edge (cid:104) u, v (cid:105) affects the maximumexternal connections of node u . This is the case where C (cid:48) isthe community that has the maximum external pull for node u . As a result, along with the degree, E max ( u ) would alsoincrease by 1. It would not change C in ( u ) . Therefore, forPermanence loss, we need to see whether P l = P erm ( u, G ) − P erm ( u, G (cid:48) ) ≥ . This reduces to, P l = I ( u ) (cid:20) E max ( u ) × deg ( u ) − E max ( u ) + 1) × ( deg ( u ) + 1) (cid:21) = I ( u ) (cid:20) E max ( u ) + deg ( u ) E max ( u ) × deg ( u ) × ( E max ( u ) + 1) × ( deg ( u ) + 1) (cid:21) > (6) Therefore, there is Permanence loss in the case of adding aninter-community edge such that E max ( u ) changes after edgeaddition. Theorem 4.4:
The Permanence loss is more in case of (6) (i.e., an edge added to the neighboring communityfrom where u experiences the maximum external pull) ascompared to (5) . Proof:
Taking Permanence loss in (5) and (6), we get, I ( u ) (cid:20) E max ( u ) + deg ( u ) E max ( u ) × deg ( u ) × ( E max ( u ) + 1) × ( deg ( u ) + 1) (cid:21) ≥ I ( u ) E max ( u ) × (cid:20) deg ( u ) × ( deg ( u ) + 1) (cid:21) ⇒ deg ( u ) ≥ which is true.
4) Intra-community Edge Addition:
Theorem 4.5:
Adding an intra-community edge (cid:104) u, v (cid:105) where u, v ∈ C does not always ensure a loss in Perma-nence. Proof:
Here we show that adding an intra-communityedge does not always result in Permanence loss. We will onlyshow the change in Permanence for node u (same applies to v ).For this, we consider two parts of Permanence separately -(i) ratio of internal-external pull, denoted by P erm ( G ) , and(ii) cohesiveness of internal neighbors, denoted by P erm ( G ) . (i) Impact on the ratio of internal-external pull: In this, we
Fig. 3. An example to demonstrate that Permanence loss in terms ofcohesiveness of internal neighbors of a node u may not always be positive. consider the effect of adding an intra-community edge (cid:104) u, v (cid:105) on the internal-external pull factor of Permanence. This updateincreases the internal degree and degree for node u by 1. Ithas no effect on the maximum external connections E max ( u ) .Therefore, for Permanence loss we need to see whether P l = P erm ( u, G ) − P erm ( u, G (cid:48) ) ≥ . This reduces to, P l = 1 E max ( u ) (cid:20) I ( u ) deg ( u ) − I ( u ) + 1 deg ( u ) + 1 (cid:21) = 1 E max ( u ) (cid:20) I ( u ) − deg ( u ) deg ( u ) × ( deg ( u ) + 1) (cid:21) ≤ (as I ( u ) ≤ deg ( u ) ) Therefore, there is no Permanence loss w.r.t. the internal-external pull (first part of (2)). (ii) Impact on cohesiveness of internal neighbors:
In this, weconsider the effect of adding an intra-community edge (cid:104) u, v (cid:105) on the cohesiveness of internal neighbors. Therefore, for Per-manence loss, we need to see whether P l = P erm ( u, G ) − P erm ( u, G (cid:48) ) ≥ . This reduces to, P l = (1 − C (cid:48) in ( u )) − (1 − C in ( u )) = C in ( u ) − C (cid:48) in ( u ) where C (cid:48) in ( u ) represents the updated internal clustering coef-ficient of u in G . P l can be positive or negative depending onhow the connections between internal neighbors of u changeafter introducing its new neighbor v . This is shown using a toyexample in Fig. 3. It can be seen that in the 1st case, P l < ,while in the second case, P l > .By combining (i) and (ii), we conclude that intra-communityedge addition does not always ensure Permanence loss. D. Proposed Algorithm: NEURAL
Since our objective function is submodular and monotonew.r.t. the possible edge updates that affect Permanence losspositively, we propose NEURAL, a greedy algorithm thatmaximizes Permanence loss to rewire nodes within a givenbudget in order to hide the target community.NEURAL makes the use of certain edge updates discussedin the previous section to rewire the network structure suchthat the community detection algorithms are not able todetect a target community C . Along with the network, ittakes as input β , indicating the budget or the maximumnumber of edge updates that are allowed. The pseudo-codeof NEURAL is shown in Algorithm 1 (flow diagram in Fig.1). At every iteration, it considers an edge update whichcontributes towards the maximum loss in Permanence for thenetwork, hence greedily updating the original network. Foran edge addition, we only consider adding inter-communityedges following Theorems 4.3 and 4.5 as it has been shown that adding an intra-community edge does not guarantee aloss in Permanence in all cases (lines 4-6 of Algorithm 1).In the case of edge deletion, we only consider deleting intra-community edges following Theorems 4.1 and 4.2 (lines 7-9 of Algorithm 1). Deleting an inter-community edge doesnot result in Permanence loss in any case; hence it is not afavorable update. Since NEURAL follows a greedy strategy,for the addition of all the competing inter-community edges,the one which has the highest Permanence loss for the networkis considered. The same approach is followed for selecting thebest intra-community edge for deletion. In the end, a choicebetween the best inter-community edge to be added and thebest intra-community edge to be deleted is made based onwhich one contributes more to network Permanence loss (lines10-13 of Algorithm 1).Note that for computing the best network update at everyiteration, we only need node information for a subset of allthe nodes present in the network which reduces the amount ofnetwork information being used. E. Time Complexity of NEURAL
The time complexity of NEURAL is O ( | V C | + | E C | ) , where | V C | and | E C | represent the number of nodes and edges(both intra-community and inter-community) in the targetcommunity C , respectively. This is because, in order to searchfor edge updates that best contribute towards the Permanenceloss for hiding C , we only need to go through the nodes andedge connections in the target community as shown in SectionIV-C. Information about the rest of the network is not required.We explore the running time complexity of NEURAL furtherin supplementary. Algorithm 1
NEURAL: Network Deception using Perma-nence Loss
Input: (i) Network G , (ii) target community C , (iii) budget β Output:
Updated Network G (cid:48) P l , add = 0 P l , del = 0 while β > do add u , maxComm u = getBestNodeForAddition(C) (6) add v = getBestExternalNodeForAddition(maxComm u ) P l , add = getEdgeAdditionLoss(add u , add v ) intraEdge ← getConnectingEdges(C) del u , del v = getBestEdgeForDeletion(intraEdge) (4) P l , del = getEdgeDeletionLoss(del u , del v , C) if P l , add ≥ P l , del and P l , add > then G ← (V , E ∪ { add u , add v } ) else if P l , del > then G ← (V , E \{ del u , del v } ) end if β = β − end while return G V. E
XPERIMENTAL S ETUP
In this section, we start by briefly describing the datasets,baseline methods, community detection methods we con-sidered for deception, and the evaluation metrics. We thenelaborate on the experimental results and the case studies.
TABLE IIS
TATISTICS OF THE REAL - WORLD NETWORKS ( | V | AND | E | REPRESENTTHE NUMBER OF NODES AND EDGES , RESPECTIVELY ; (cid:104) k (cid:105) ( k max ) REPRESENTS THE AVERAGE ( MAXIMUM ) DEGREE OF NODES ).Network | V | | E | (cid:104) k (cid:105) k max Kar 34 78 4.59 17Dol 62 159 5.13 12Lesmis 77 154 6.60 36Polbook 105 441 8.40 25Adjnoun 112 425 7.60 49Power 4,941 6,594 2.67 19Dblp 317,080 1,049,866 4.93 343
A. Synthetic and Real-world Networks
We conduct experiments on two types of networks: (i) Synthetic networks:
We use LFR Benchmark [26] andvary the following parameters to generate synthetic networks: N , number of nodes and µ , the ratio of external connectionsof a node to degree. The other parameters are set to defaultas mentioned in the original implementation. Unless otherwisestated, we consider the following setting to generate the defaultsynthetic network: N = 10 , , µ = 0 . (as suggested in [8]). (ii) Real-world networks: We use seven real-world networks -(1) Zachary’s Karate Club (Kar) , (2) Dolphin social network(Dol) , (3) Les Miserables (Lesmis) , (4) Books about USPolitics (Polbook) , (5) Word adjacencies (Adjn) , (6) USPower Grid (Power) and (7) DBLP collaboration network(Dblp) . Table II summarises the statistics of the networks.Note that we do not require the ground-truth communitystructure since our primary aim is to deceive a communitydetection algorithm so that after rewiring, the communityaffiliation of target nodes remains unrevealed. B. Baseline Methods
We compare NEURAL with four baseline methods:1)
Random algorithm updates the network by randomlyselecting the type of edge update (edge addition/deletion),along with the end nodes.2)
Nagaraja algorithm [6] updates the network by addingedges between nodes selected on the basis of vertex-centrality measures.3)
DICE [2] updates the network by randomly adding inter-community edges or deleting intra-community edges.4)
SADDEN [5] updates the network by maximizing the safeness gain in every iteration of edge update based ongreedy optimization.
C. Community Detection Algorithms
We consider six diverse and widely used community de-tection algorithms: Louvain (Louv) [27], WalkTrap (Walk)[14], Greedy [28], InfoMap (Info) [15], Label Propagation(Labprop) [29], and Leading Eigenvectors (Eig) [30]. Notethat none of these algorithms use Permanence as a metricfor optimization. Therefore, NEURAL is agnostic to theunderlying mechanism of these algorithms. ~ mejn/netdata/ http://snap.stanford.edu/data/ TABLE IIIC
OMPARISON ON THE DEFAULT
LFR
NETWORK , KEEPING β = 0 . | V C | , WHERE V C IS THE SIZE THE TARGET COMMUNITY .Method NMI MNMI CommS CommURandom 0.99 0.97 1.13 0.15Nagaraja 0.99 0.28 1.21 0.86DICE
D. Evaluation Metrics
Here, we briefly describe the metrics used to evaluate thecommunity deception methods. ↑ (resp. ↓ ) indicates higher( resp. lower) the value of the metric, better the performance. (i) Normalized Mutual Information (NMI) ↓ [31]: Tocheck how much the deception methods are able to hide aparticular target community C in the network, we calculatethe NMI score between the original community structure ofthe network, CS = ( C , C , ...C k ) and the new communitystructure obtained from a community detection algorithm onthe updated network, CS (cid:48) = ( C (cid:48) , C (cid:48) , ...C (cid:48) k (cid:48) ) . The metricranges from (suggesting no overlap between CS and CS (cid:48) )to (suggesting a complete overlap between CS and CS (cid:48) ). (ii) Modified Normalized Mutual Information (MNMI) ↓ : For large networks, hiding a target community C may nothave a major effect on the other communities which are notin immediate contact with C . As a result, to capture howeffective a deception method is in hiding C , we may need tomeasure NMI between the community memberships of nodesin the target communities and their immediate neighborsbefore and after the edge updates. We call this metric MNMI.Its range is same as that of NMI. (iii) Community Splits (CommS) ↑ : We propose this metricto define the number of communities in CS (cid:48) containing thenodes of the target community C in the updated network G (cid:48) .It ranges from 1 (all nodes in C remain in one community in CS (cid:48) ) to | CS (cid:48) | (all nodes in C get distributed into differentcommunities of CS (cid:48) ). The higher the value of CommS, thewider would be the split of the nodes in C , thereby increasingthe deception of the target community. CommS = (cid:80) C (cid:48) i ∈ CS (cid:48) h ( C (cid:48) i , C ) ; h ( C (cid:48) i , C ) = (cid:40) V C ∩ V C (cid:48) i (cid:54) = φ V C ∩ V C (cid:48) i = φ where V C represents the set of nodes belonging to C , and V C i represents set of nodes belonging to community C i ∈ CS (cid:48) . (iv) Community Uniformity (CommU) ↑ : We propose thismetric to capture how nodes in the target community C get distributed among communities in the new communitystructure CS (cid:48) . It is obtained by calculating the entropy oftarget community’s nodes present among the communitiesin CS (cid:48) as follows: CommU = (cid:80) C (cid:48) i (cid:15)CS (cid:48) − | V C,C (cid:48) i || V C | log | V C,C (cid:48) i || V C | ,where | V C,C (cid:48) i | represents the number of nodes in C presentin C (cid:48) i ∈ CS (cid:48) , and | V C | represents the total number of nodespresent in C . It ranges from 0 (when all nodes of C remainin one community of CS (cid:48) ) to log |CS (cid:48) | (when all nodes of C get distributed into different communities of CS (cid:48) ).VI. Q UANTITATIVE E VALUATION
Here we present the quantitative analysis of experimentalresults on both synthetic and real-world networks.
Fig. 4. NMI, MNMI and CommS on the default synthetic network by varying β in (a)-(c) and µ in (d)-(f), keeping β = 0 . | V c | , where | V C | is the numberof nodes in the target community. A. Evaluation on Synthetic Networks
We use the default LFR network, set the budget β asthe fraction of nodes in the target community C and varythe fraction from 0.1 to 0.6. The result is averaged over 20synthetic networks, 5 randomly selected target communitiesand 10 runs for each target community. Figs. 4(a)-(c) showthat with an increase of β , NEURAL is able to hide C better(NMI, MNMI scores decrease and CommS scores increase)showing a parallel between the allowed budget and its effecton community deception.We further conduct experiments by varying the parameter µ of LFR network from 0.1 to 0.9. Figs. 4(d)-(f) show thatwith an increase in µ , the nodes in C are concealed more byNEURAL (NMI, MNMI scores decrease and CommS scoresincrease). The above observation matches the expectation thatit would be easier to hide a target community which has moresparse intra-community connections than the inter-communityconnections.Table III shows that NEURAL delivers comparable (andsometimes better) accuracy on the default synthetic network. B. Evaluation on Real-world Networks
In case of experiments on real-world networks, we fix β to of the size of the target community C (i.e., β =0 . | V C | ). The results reported here are obtained by averagingthe performance considering each of the communities astarget community at a time and over 10 runs for each targetcommunity.For a compact visualization, we rank five competing com-munity deception methods as follows: for each evaluationmetric and each community detection algorithm, we normalizetheir scores (using min-max normalization) so that the bestperforming method gets score . Now if a competing methodoutperforms others by deceiving all the six community detec-tion algorithms w.r.t. that evaluation metric, it will secure acomposite score of . Fig. 5. (Color online) Composite performance of the five competing community deception methods based on (a) NMI, (b) MNMI, (c) CommS, and (d)CommU. Bars in each group (under each dataset) are ordered as follows: (1) Random, (2) Nagaraja, (3) DICE, (4) SADDEN, and (5) NEURAL (as shownin Fig. (c)). Fig.(e) shows the composite performance of each competing method based on every evaluation metric averaged over all the datasets.
Figs. 5(a)-(d) show the composite performance across allthe evaluation metrics. Fig. 5(e) shows the composite per-formance of individual competing methods averaged over allthe datasets. We observe that NEURAL outperforms otherswith a significant margin - NEURAL achieves a compos-ite score of . (averaged over all the evaluation metricsand datasets), outperforming Random, Nagaraja, DICE andSADDEN by . , . , . , and . ,respectively (see supplementary for the raw accuracy scoresover all the datasets). Note that SADDEN turns out to behighly competitive, sometimes showing marginal improvementover NEURAL. However, in general NEURAL is better thanothers (Fig. 5(e)). We also consider hiding individual targetnodes instead of communities. Refer supplementary for thesame. C. Non-uniform Budget for Edge Updates
Till now, we have reported the results with a unified budget β for all the edge update operations. In this section, weextend NEURAL with non-uniform budget wherein separatebudget constraints are applied to the two types of allowededge updates (as elaborated in Section IV-D): (i) β D for intra-community edge deletion, and (ii) β A for inter-communityedge addition. Such an analysis could be useful in situationswherein the costs incurred while deleting an intra-communityedge and adding an inter-community edge are different. Weperform experiments under two different settings of β D , β A : (i) β D = 0 . β, β A = 0 . β , and (ii) β D = 0 . β,β A = 0 . β (we fix β as default, i.e., of the size of thetarget community C ). Tables IV and V provide raw accuracyvalues (by averaging over all the communities) for NEURALand SADDEN (the best baseline, extending it in a similarmanner) on Karate real-world network (see supplementary forothers) for the two settings mentioned above. We observe thatNEURAL outperforms SADDEN in most cases.VII. Q UALITATIVE E VALUATION
To interpret the rewiring suggested by NEURAL and SAD-DEN (top two methods), we further take three real-world
TABLE IVA
CCURACY OF TWO COMPETING COMMUNITY DECEPTION METHODS : (1)S: SADDEN (
BEST BASELINE ), AND (2) N: NEURAL
OVER K ARATE , SUCH THAT β D = 0 . β ; β A = 0 . β Comm. Det. NMI MNMI CommS CommUAlgo. S N S N S N S NLouv 0.94
Walk 0.78
Greedy 0.77
Info 0.83
Labprop 0.84
TABLE VA
CCURACY OF TWO COMPETING COMMUNITY DECEPTION METHODS : (1)S: SADDEN (
BEST BASELINE ), AND (2) N: NEURAL
OVER K ARATE , SUCH THAT β D = 0 . β ; β A = 0 . β Comm. Det. NMI MNMI CommS CommUAlgo. S N S N S N S NLouv 0.94
Walk 0.79
Greedy 0.81
Info
Labprop 0.68 attributed networks. Unless otherwise state, we only considerdeletion of edges (as addition of a new edge does not makeany sense for these networks). Louvain algorithm is used forcommunity detection, and the largest community is consideredas the target community.
A. Citation Network
We consider , papers published in Physical ReviewJournals as nodes and , citation interactions (we ignoredirectionality) among them as edges . After hiding the targetcommunity (largest community) we observe that NEURALtends to pick up those citation interactions (or edges) whose age (defined by the difference between the publication yearsof citing and cited papers) is relatively high (we believe thatthese edges have much more importance in terms of keepingthe identity of the target community intact, being connectedto papers (or nodes) published earlier than most in literature). https://journals.aps.org/datasets NEURAL performs better than SADDEN in terms of updatingmore edges of such kind (Fig. 6(a)).We further measure the correlation (Spearman’s ρ andKendall’s τ ) of edges selected and ranked by NEURALand those ranked by their age (ground-truth) (similar corre-lation for edges returned by SADDEN). Table VI showsthat NEURAL outperforms SADDEN. Moreover, NEURALreturns the top three edges based on their age present inthe target community within top 20 of the rank list, whereasSADDEN is unable to return a single such edge within the138 edges returned. Fig. 6. (a) Age and (b) similarity score distribution of edges selected byNEURAL and SADDEN from Citation and Terrorist networks, respectively.
B. Terrorist Network
We use the
Global Terrorism Database to create a net-work of terrorist group associations. This dataset consists of , terrorist events around the world between 1970-2018.In order to create the network, we compute the similaritybetween two terrorist groups based on their activities. Toquantify similarity between two groups, we use five attributes:(i) severity of the attack (number of casualties), (ii) attackingstrategy used in majority events, (iii) type of weapon usedin majority events, (iv) peak year of attacks, and (v) thetarget type in majority events. Thus, two terrorist groups areassociated with a link if the similarity score is greater than orequal to 2.5 (out of 5). This gives rise to a network having , nodes as terrorist groups and , unweighted edgesas association links among these groups.After hiding the target community (largest community),we observe that NEURAL first picks up those edges whichhave higher similarity scores, removing a link between twohighly similar terrorist groups. NEURAL performs bettercompared to SADDEN in terms of providing more edgeswith high similarity scores (Fig. 6(b)). We further measurethe rank correlation between edges returned and ranked byNEURAL with those ranked by their similarity scores (ground-truth) (similarly for edges returned by SADDEN). TableVI shows that NEURAL once again outperforms SADDEN interms of returning edges whose similarity score is high. C. Breast Cancer Network
Breast cancer is considered a leading cause of morbidityand mortality among women worldwide. Above 12% of thewomen in the United States are diagnosed with breast cancerduring their lifetime [32]. Alteration of gene regulation hasbeen widely studied in this context [33], with a special focuson dynamic changes in gene co-expression modules. Under This analysis was conducted by two professional biologists. TABLE VIR
ANK CORRELATION FOR CITATION AND TERRORIST NETWORKS , ANDACCURACY FOR BREAST CANCER NETWORK . Method Citation TerroristSpearman’s ρ Kendall’s τ SADDEN 0.12 0.21 0.00 0.07 Rank correlations areNEURAL statistically significantBreast cancer with p -value > . MAP F1 score nDCG AUCSADDEN 0.004 0.20 0.47 0.30NEURAL the Cancer Genome Atlas (TCGA) program, a community-scale effort has been directed towards multi-omic molecularprofiling of breast tumors in hundreds of patients [34]. Weuse Fragments Per Kilobase of transcript per Million mappedreads (FPKM) normalized gene expression data from TCGAto understand if disguising community affiliation plays a rolein the pathogenesis of critical diseases such as cancers. Toachieve this, we construct a control and a cancer-specific co-expression network based on transcriptomic profiles of 1,097normal (as controls) and tumor samples obtained fromthe TCGA repository. Both networks spanned the same setof 1,000 genes (1,000 nodes). Two nodes are connected byan edge when the Pearson’s correlation coefficient computedacross the entire spectrum of control/tumor samples qualifiesa cut-off value of . ( , edges). Deleterious mutationsin cancer cause wide-spread loss-of-function events, which areoften manifested by changes in gene expression levels.We employ NEURAL and SADDEN to retrieve co-expressions (edges), whose disappearance fosters communitydisintegration. NEURAL and SADDEN could pin-point and rewirings in the form of edge deletion, respec-tively, which could be cross-validated w.r.t. the cancer-specificnetwork. Quite strikingly, out of the correctly pre-dicted deletions by NEURAL, harbors BMP2 inducible kinase(BPMP2K) as one of the nodes. We find definitive studiesimplicating this molecule in breast cancer [35]. We fail tofind any literature support for the novel gene Z97832.2 thatwas relatively enriched ( out rewirings) among SADDENpredicted rewirings. We also measure how accurate NEURALand SADDEN are to predict the ground-truth edges w.r.t thecancer-specific network. Table VI shows that NEURAL out-performs SADDEN on four evaluation measures. To this end,we conclude that NEURAL-led investigation of genome-scalemolecular networks holds significant promise in understandinggenetic diseases such as cancers.VIII. C ONCLUSION
This paper addressed the problem of community deception –outwitting community detection algorithms from discoveringthe community affiliation of nodes in a target community.Our major contributions are as follows: (i) we formalized the problem and called it
Hide and Seek Community (HSC);(ii) we proposed a novel objective function (Permanenceloss) which has been analyzed theoretically; (iii) we proposedNEURAL, a novel greedy strategy to optimize Permanenceloss; (iv) NEURAL turned out to be more efficient thanthe baselines; and (v) NEURAL unfolded different meta-information of edges which would otherwise not have been possible to explain just by analyzing the network structure.In particular, NEURAL showed promise in the analysis ofgenome-scale molecular networks.A CKNOWLEDGEMENT
The work was partially supported by the Ramanujan Fel-lowship and DST (ECR/2017/00l691). T. Chakraborty wouldlike to acknowledge the support of CAI, IIIT-Delhi.R
EFERENCES[1] S. Fortunato and D. Hric, “Community detection in networks: A userguide,”
Physics reports , vol. 659, pp. 1–44, 2016.[2] M. Waniek, T. P. Michalak, M. J. Wooldridge, and T. Rahwan, “Hidingindividuals and communities in a social network,”
Nature HumanBehaviour , vol. 2, no. 2, pp. 139–147, 2018.[3] A. Mislove, B. Viswanath, K. P. Gummadi, and P. Druschel, “You arewho you know: inferring user profiles in online social networks,” in
WSDM , 2010, pp. 251–260.[4] T. Ji, C. Luo, Y. Guo, Q. Wang, L. Yu, and P. Li, “Community detectionin online social networks: A differentially private and parsimoniousapproach,”
IEEE Transactions on Computational Social Systems , vol. 7,no. 1, pp. 151–163, 2020.[5] V. Fionda and G. Pirr`o, “Community deception or: How to stop fearingcommunity detection algorithms,”
IEEE TKDE , vol. 30, no. 4, pp. 660–673, 2018.[6] S. Nagaraja, “The impact of unlinkability on adversarial communitydetection: Effects and countermeasures,” in
PETS , 2010, pp. 253–272.[7] Y. Liu, J. Liu, Z. Zhang, L. Zhu, and A. Li, “Rem: From structuralentropy to community structure deception,” in
NIPS , 2019, pp. 12 918–12 928.[8] T. Chakraborty, S. Srinivasan, N. Ganguly, A. Mukherjee, andS. Bhowmick, “On the permanence of vertices in network communities,”in
SIGKDD , 2014, pp. 1396–1405.[9] K. Berahmand, A. Bouyer, and M. Vasighi, “Community detection incomplex networks by detecting and expanding core nodes through ex-tended local similarity of nodes,”
IEEE Transactions on ComputationalSocial Systems , vol. 5, no. 4, pp. 1021–1033, 2018.[10] M. E. J. Newman and M. Girvan, “Finding and evaluating communitystructure in networks,”
PRE , vol. 69, no. 2, 2004.[11] M. Chen, K. Kuzmin, and B. K. Szymanski, “Community detectionvia maximization of modularity and its variants,”
IEEE Transactions onComputational Social Systems , vol. 1, no. 1, pp. 46–65, 2014.[12] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, “Communitystructure in large networks: Natural cluster sizes and the absence of largewell-defined clusters,”
Internet Mathematics , vol. 6, no. 1, pp. 29–123,2009.[13] J. Leskovec, K. J. Lang, and M. Mahoney, “Empirical comparison ofalgorithms for network community detection,” in
WWW , 2010, pp. 631–640.[14] P. Pons and M. Latapy, “Computing communities in large networksusing random walks,” in
Computer and Information Sciences - ISCIS ,p. Yolum, T. G¨ung¨or, F. G¨urgen, and C. ¨Ozturan, Eds. Berlin,Heidelberg: Springer Berlin Heidelberg, 2005, pp. 284–293.[15] M. Rosvall and C. T. Bergstrom, “Maps of random walks on complexnetworks reveal community structure,”
PNAS , vol. 105, no. 4, pp. 1118–1123, 2008.[16] ——, “An information-theoretic framework for resolving communitystructure in complex networks,”
PNAS , vol. 104, no. 18, pp. 7327–7331,2007.[17] L. Donetti and M. A. Mu˜noz, “Detecting network communities: a newsystematic and efficient algorithm,”
JSTAT , vol. 2004, no. 10, p. P10012,2004.[18] A. Capocci, V. Servedio, G. Caldarelli, and F. Colaiori, “Detectingcommunities in large networks,”
Physica A , vol. 352, no. 2, pp. 669–676,2005.[19] G. Palla, I. Derenyi, I. Farkas, and T. Vicsek, “Uncovering the overlap-ping community structure of complex networks in nature and society,”
Nature , vol. 435, no. 7043, pp. 814–818, 2005.[20] N. Alduaiji, A. Datta, and J. Li, “Influence propagation model for clique-based community detection in social networks,”
IEEE Transactions onComputational Social Systems , vol. 5, no. 2, pp. 563–575, 2018.[21] S. Fortunato, “Community detection in graphs,”
Physics Reports , vol.486, no. 3–5, pp. 75–174, 2010. [22] T. Chakraborty, A. Dalmia, A. Mukherjee, and N. Ganguly, “Metrics forcommunity analysis: A survey,”
ACM Compt. Surv. , vol. 50, no. 4, pp.1–37, 2017.[23] T. Chakraborty, S. Srinivasan, N. Ganguly, A. Mukherjee, andS. Bhowmick, “Permanence and community structure in complex net-works,”
ACM TKDD , vol. 11, no. 2, pp. 1–34, 2016.[24] M. E. J. Newman, “Modularity and community structure in networks,”
PNAS , vol. 103, no. 23, pp. 8577–8582, 2006.[25] S. Muff, F. Rao, and A. Caflisch, “Local modularity measure for networkclusterizations,”
PRE , vol. 72, no. 5, p. 056107, 2005.[26] A. Lancichinetti, S. Fortunato, and F. Radicchi, “Benchmark graphs fortesting community detection algorithms,”
Physical review E , vol. 78,no. 4, p. 046110, 2008.[27] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fastunfolding of communities in large networks,”
JSTAT , vol. 2008, no. 10,2008.[28] A. Clauset, M. E. J. Newman, and C. Moore, “Finding communitystructure in very large networks,”
PRE , vol. 70, p. 066111, 2004.[29] U. N. Raghavan, R. Albert, and S. Kumara, “Near linear time algorithmto detect community structures in large-scale networks,”
PRE , vol. 76,no. 3, p. 036106, 2007.[30] M. E. J. Newman, “Finding community structure in networks using theeigenvectors of matrices,”
PRE , vol. 74, p. 036104, 2006.[31] L. Danon, A. D´ıaz-Guilera, J. Duch, and A. Arenas, “Comparingcommunity structure identification,”
JSTAT , vol. 2005, no. 09, 2005.[32] A. G. Waks and E. P. Winer, “Breast cancer treatment: a review,”
Jama ,vol. 321, no. 3, pp. 288–300, 2019.[33] D. Sengupta and S. Bandyopadhyay, “Topological patterns in microrna–gene regulatory network: studies in colorectal and breast cancer,”
Molec-ular bioSystems , vol. 9, no. 6, pp. 1360–1371, 2013.[34] C. G. A. Network et al. , “Comprehensive molecular portraits of humanbreast tumours,”
Nature , vol. 490, no. 7418, p. 61, 2012.[35] S. Buraschi, T. Neill, R. T. Owens, L. A. Iniguez, G. Purkins,R. Vadigepalli, B. Evans, L. Schaefer, S. C. Peiper, Z.-X. Wang et al. ,“Decorin protein core affects the global gene expression profile of thetumor microenvironment in a triple-negative orthotopic breast carcinomaxenograft model,”
PloS one , vol. 7, no. 9, 2012.
Shravika Mittal is a senior undergraduate stu-dent in Computer Science and Engineering at IIIT-Delhi. Her research interests include Social NetworkAnalysis, Network Science, and Natural LanguageProcessing. She has received the Dean’s list for Ex-cellence in Academics, and Innovation in Researchand Development.
Debarka Sengupta received his Ph.D. from Ja-davpur University. Before joining IIIT-D, he workedas an INSPIRE Faculty at Indian Statistical Insti-tute. He consulted and advised a number of tech-nology and service-based firms including IPsoft,Datanomers, CoreCompete and Applied ResearchWorks on various data science and business analyt-ics projects. He has twice been nominated for theprestigious INSPIRE Faculty award - in 2014 and2016.
Tanmoy Chakraborty is an Assistant Professorand a Ramanujan Fellow at the Dept. of Com-puter Science and Engineering, IIIT-Delhi, India,where he leads a research group, called LCS2(http://lcs2.iiitd.edu.in/). His primary research inter-ests include Social Network Analysis, Data Mining,and Natural Language Processing. He has receivedseveral awards including Google Indian FacultyAward, Early Career Research Award, DAAD Fac-ulty award. More details at http://faculty.iiitd.ac.in/ ∼ tanmoy/. ide and Seek: Outwitting Community Detection Algorithms(Supplementary Materials) Shravika Mittal, Debarka Sengupta, Tanmoy Chakraborty { shravika16093, debarka, tanmoy } @iiitd.ac.in IX. P
ERMANENCE LOSS IS S UBMODULAR AND M ONOTONE
In this section, we prove that the proposed objective functionis submodular and monotone w.r.t. the number of edge updates.
Theorem 9.1:
Permanence loss P l is submodular w.r.t.the addition of an inter-community edge. Proof:
Let A be the set of inter-community edges that arebeing considered to be added into the network G using The-orem 4.3. Let x and x be two other such inter-communityedges where x , x / ∈ A . P l ( A ∪ { x } ) ( resp. P l ( A ∪ { x } ) )represents the permanence loss due to the addition of the inter-community edge set A ∪ { x } ( resp. A ∪ { x } ) (given by (6)).Therefore, P l ( A ∪ { x } ) + P l ( A ∪ { x } ) = 2 (cid:88) i(cid:15)A P l ( i ) + P l ( x ) + P l ( x )= P l ( A ∪ { x , x } ) + P l ( A ) This shows that Permanence loss is submodular w.r.t. theaddition of an inter-community edge.
Theorem 9.2:
Permanence loss P l is submodular w.r.t.the deletion of an intra-community edge. Proof:
Let A be the set of intra-community edges that arebeing considered to be added into the network G using The-orem 4.2. Let x and x be two other such intra-communityedges such that x , x / ∈ A . P l ( A ∪{ x } ) ( resp. P l ( A ∪{ x } ) ) represents the permanence loss due to the deletion of the setof intra-community edges A ∪ { x } ( resp. A ∪ { x } ) (givenby (4)). Therefore, P l ( A ∪ { x } ) + P l ( A ∪ { x } ) = 2 (cid:88) i(cid:15)A P l ( i ) + P l ( x ) + P l ( x )= P l ( A ∪ { x , x } ) + P l ( A ) This shows that Permanence loss is submodular w.r.t. thedeletion of an intra-community edge.
Theorem 9.3:
Permanence loss P l is monotone w.r.t. theaddition of an inter-community edge. Proof:
Let A be the set of inter-community edges ob-tained using Theorem 4.3. Let B be the set of inter-communityedges such that B = A ∪ { x } , where x is another inter-community edge considered to update the network for com-munity deception. Since A ⊆ B , P l ( B ) = P l ( A ) + P l ( x ) ≥P l ( A ) using (6), which proves that the Permanence loss ismonotone w.r.t. to the addition of an inter-community edge. Theorem 9.4:
Permanence loss P l is monotone w.r.t. thedeletion of an intra-community edge. Proof:
Let A be the set of intra-community edges ob-tained using Theorem 4.2. Let B be the set of intra-communityedges such that B = A ∪ { x } , where x is another intra-community edge considered to update the network for com-munity deception. Since A ⊆ B , P l ( B ) = P l ( A ) + P l ( x ) ≥ TABLE VIIS
CORES OF TWO METHODS : (1) SADDEN,
AND (2) NEURAL
FORHIDING INDIVIDUAL TARGET NODES , AVERAGED ACROSS DIFFERENTRUNS . Network Score (SADDEN) Score (NEURAL)Kar 0.62
Dol 0.59
Lesmis 0.64
Polbook 0.71
Adjnoun 0.68
Power 0.54
Dblp 0.64
Fig. 7. Scalability analysis of NEURAL and SADDEN. P l ( A ) using (4), which proves that the Permanence loss ismonotone w.r.t. the deletion of an intra-community edge.X. H IDING N ODES RATHER THAN C OMMUNITIES
In this section, we address a modified version of ourproblem statement to hide individual target nodes instead ofcommunities. To hide nodes, we employ our proposed NEU-RAL algorithm by treating nodes as singleton communities.Since deleting an intra-community edge would be redundantin case of a singleton community, NEURAL only focuseson adding inter-community edges to hide target nodes. Weevaluate our methodology by selecting . | V | nodes randomlyas targets for 7 real-world networks. Louvain algorithm is usedto extract community assignment. Table VII summarises theprobability scores of hiding the target nodes for NEURALand SADDEN (the best baseline). We observe that NEURALoutperforms SADDEN by assigning a different communitylabel for more target nodes.XI. S CALABILITY A NALYSIS
The discussion in Section IV-D has established that the timecomplexity of NEURAL is O ( | V C | + | E C | ) . To show thisempirically, we use the LFR Benchmark [26] for generatingsynthetic networks with µ = 0 . . Louvain algorithm isused for community detection, and the largest community isconsidered as the target community such that | E C | lies withinthe range − . The results are averaged over 10 suchsynthetic networks. We record the run times for two deceptionstrategies (based on greedy optimization): (i) SADDEN and(ii) NEURAL (we fix β as default i.e., β = 0 . | V C | ). Fig. 7shows that the run time for NEURAL increases linearly with | E C | thereby verifying the analytical time complexity shownin Section IV-D ( | E C | >> | V C | ; O ( | V C | + | E C | ) ≈ O ( | E C | ) ).We also observe that with an increase in | E C | , NEURALoutperforms SADDEN in terms of its run time.11 TABLE VIIIA
CCURACY OF THE FIVE COMPETING COMMUNITY DECEPTION METHODS : (1) N AG : N AGARAJA , (2) R: R
ANDOM , (3) D: DICE, (4) S: SADDEN,
AND (5) N: NEURAL
OVER REAL WORLD NETWORKS : (A) K
ARATE , (B) D
OLPHIN , (C) L
ESMIS , (D) P
OLBOOKS , (E) A
DJN , (F) P
OWER AND (G) D
BLP . (A) Karate Comm. Det. NMI MNMI CommS CommUAlgo. Nag R D S N Nag R D S N Nag R D S N Nag R D S NLouv 0.94 0.88 0.92 0.82
Info 0.89 0.85 0.85 0.72
Labprop 0.29 0.76 0.64 0.52
Eig 0.94 0.89 0.91 0.83
Comm. Det. NMI MNMI CommS CommUAlgo. Nag R D S N Nag R D S N Nag R D S N Nag R D S NLouv 0.89 0.84 0.85 0.79
Walk 0.77 0.83 0.78
Info 0.92 0.91 0.87 0.85
Eig 0.84 0.84 0.88 0.83 (C) Lesmis
Comm. Det. NMI MNMI CommS CommUAlgo. Nag R D S N Nag R D S N Nag R D S N Nag R D S NLouv 0.96 0.90 0.89 0.89
Greedy 0.90 0.92 0.91 0.87
Info 0.98 0.98 0.95 0.94
Labprop 0.69 0.74 0.78 0.69
Comm. Det. NMI MNMI CommS CommUAlgo. Nag R D S N Nag R D S N Nag R D S N Nag R D S NLouv 0.98 0.99 0.95
Walk 0.97 0.95 0.94
Greedy 0.97 0.93 0.92 0.95
Labprop 0.87 0.81 0.80 0.78
Comm. Det. NMI MNMI CommS CommUAlgo. Nag R D S N Nag R D S N Nag R D S N Nag R D S NLouv 0.74 0.74 0.71
Walk 0.97 0.99 0.95 0.95
Greedy 0.75 0.60 0.62 0.66
Info 0.50 0.45 0.51 0.57
Labprop 1.00 1.00 0.00 1.00
Comm. Det. NMI MNMI CommS CommUAlgo. Nag R D S N Nag R D S N Nag R D S N Nag R D S NLouv 0.97 0.98 0.99 0.98
Walk 0.98 0.99 0.97 0.93
Greedy 0.97 0.98 0.96 0.94
Info 0.98 0.98 0.98 0.97
Eig 0.95 0.95 0.93
Comm. Det. NMI MNMI CommS CommUAlgo. Nag R D S N Nag R D S N Nag R D S N Nag R D S NLouv 1.00 0.99 0.99 0.98
Walk 1.00 0.99 0.97 0.97
Greedy 0.99 1.00 0.99
Info 0.99 1.00 0.99 0.99
Eig 0.93 0.97 0.98 0.92 TABLE IXA
CCURACY OF TWO COMPETING COMMUNITY DECEPTION METHODS : (1)S: SADDEN (
BEST BASELINE ), AND (2) N: NEURAL
OVER REALWORLD NETWORKS : (A) D
OLPHIN , (B) L
ESMIS , (C) P
OLBOOKS , (D)A
DJN , AND (E) P
OWER , SUCH THAT β D = 0 . β ; β A = 0 . β , β = 0 . | V C | (A) Dolphin Comm. Det. NMI MNMI CommS CommUAlgo. S N S N S N S NLouv 0.81
Greedy 0.86
Info 0.84
Labprop 0.75
Eig 0.81
Comm. Det. NMI MNMI CommS CommUAlgo. S N S N S N S NLouv 0.89
Walk 0.92
Info 0.95
Eig 0.95
Comm. Det. NMI MNMI CommS CommUAlgo. S N S N S N S NLouv 0.95
Walk
Greedy 0.92
Info 0.94
Labprop 0.89
Eig 0.92
Comm. Det. NMI MNMI CommS CommUAlgo. S N S N S N S NLouv 0.63
Walk 0.96
Greedy 0.67
Info 0.57
Labprop 1.00
Eig
Comm. Det. NMI MNMI CommS CommUAlgo. S N S N S N S NLouv 0.98
Walk 0.94
Greedy 0.95
Info 0.98
Eig
TABLE XA
CCURACY OF TWO COMPETING COMMUNITY DECEPTION METHODS : (1)S: SADDEN (
BEST BASELINE ), AND (2) N: NEURAL
OVER REALWORLD NETWORKS : (A) D
OLPHIN , (B) L
ESMIS , (C) P
OLBOOKS , (D)A
DJN , AND (E) P
OWER , SUCH THAT β D = 0 . β ; β A = 0 . β , β = 0 . | V C | (A) Dolphin Comm. Det. NMI MNMI CommS CommUAlgo. S N S N S N S NLouv 0.91
Walk 0.71
Greedy
Info 0.87
Labprop 0.72
Eig
Comm. Det. NMI MNMI CommS CommUAlgo. S N S N S N S NLouv 0.90
Walk 0.95
Info 0.95
Labprop 0.75
Eig 0.92
Comm. Det. NMI MNMI CommS CommUAlgo. S N S N S N S NLouv
Walk
Greedy 0.95
Info 0.96
Labprop 0.82
Eig 0.91 (D) Adjn
Comm. Det. NMI MNMI CommS CommUAlgo. S N S N S N S NLouv
Walk 0.95
Greedy 0.65
Info 0.68
Eig
Comm. Det. NMI MNMI CommS CommUAlgo. S N S N S N S NLouv 0.98
Walk
Greedy 0.95
Info 0.98
Eig0.66