Social Diffusion Sources Can Escape Detection
SSocial Diffusion Sources Can Escape Detection
Marcin Waniek a , Manuel Cebrian b , Petter Holme c , and Talal Rahwan a a New York University Abu Dhabi, Abu Dhabi, UAE b Max Planck Institute for Human Development, Berlin, Germany c Tokyo Institute of Technology, Tokyo, Japan
Abstract
Influencing (and being influenced by) others indirectly through social networks is fundamental toall human societies. Whether this happens through the diffusion of rumors, viruses, opinions, or know-how, finding the source is of persistent interest to people and an algorithmic challenge of much currentresearch interest. However, no study has considered the case of diffusion sources actively trying toavoid detection. By disregarding this assumption, we risk conflating intentional obfuscation from thefundamental limitations of source-finding algorithms. We close this gap by separating two mechanismshiding diffusion sources—one stemming from the network topology itself and the other from strategicmanipulation of the network. We find that identifying the source can be challenging even without foulplay and, many times, it is easy to evade source-detection algorithms further. We show that hidingconnections that were part of the viral cascade is far more effective than introducing fake individuals.Thus, efforts should focus on exposing concealed ties rather than planted fake entities, e.g., bots in socialmedia; such exposure would drastically improve our chances of detecting the source of a social diffusion.
As humans, we are perpetually involved in and affected by things spreading in networks—from infections [6,10] to ideas, from financial distress to fake news [4, 24]. Furthermore, we live in an increasingly networkedworld—the networks of our society are becoming more and denser, meaning that spreading phenomenahappen with accelerating speed [3]. Occasionally, we do not know who started the spreading. However,recent research shows that we can often infer the source with great accuracy [26, 21, 31]. If the spreading hasan illicit intent or negative consequences for the source—like bioterrorism, disinformation, or whistleblowingin an authoritarian society—the source would want to hide from such source detection algorithms. In thisarticle, we study the conditions under which such hiding can be successful.Throughout the article, we focus on two relevant applications, namely infectious diseases and disinfor-mation in social media, even though our results apply more widely. Infectious diseases in particular provideus with an illustrative example. At the time of writing, the World is suffering from COVID-19—one of theworst epidemic outbreaks in living memory [12]. The very early stages of this outbreak are still not entirelymapped out [27]. While the most common hypothesis is that the virus crossed over to humans around theHuanan Seafood Market in Wuhan, China [20], this has been a subject of much conspiracy theorizing andspeculations by politicians, journalist, and academics. If one knew about the earliest COVID cases’ where-abouts, source detection algorithms could be used to pinpoint the origin. However, could such methods ruleout conspiracy theories claiming deliberate obfuscation of the origin?The main challenge to understand the possible obfuscation of the diffusion source is that it is a problemwith two components. First, networks have an innate ability to hide the source; although most research inthe literature has focused on designing source detection algorithms, the efficiency of these algorithms stronglydepends on the network structure. Second, by changing its local network surrounding, the source can hide itsidentity. Nevertheless, this also depends on many factors: the timing of the source detection, the dynamics1 a r X i v : . [ c s . S I] F e b f the spreading, and the network structure itself. So far, no theory has been able to separate these twofactors. This article builds such a theory from the systematic simulations of many source detection scenarios,with and without active obfuscation of the source. For the sake of parsimony, we base our study on simplecontagion—modeling infectious diseases and probably some classes of information diffusion [24, 16]—but usemany different scenarios concerning network structure, source detection, and timing of these. We consider the problem of hiding the source of a diffusion process in a network. In particular, we consideran undirected network G = ( V, E ) , where one of the nodes, v † ∈ V , starts a diffusion process, resulting ina subset of nodes I becoming infected. In this work we assume that this process follows the Susceptible-Infected (SI) model [22]; see Methods for details on the network notation and the SI model. We considersituations where v † wishes to avoid being detected as the source of the diffusion process. Hence, we call thenode v † the evader . We also assume the existence of another entity, called the seeker , whose goal is to identifythe origin of the diffusion using source detection algorithms. In our analysis, we consider source detectionalgorithms that return a ranking of network nodes [11, 32, 21, 2], where the node at the top position inthe ranking is identified as the source; see Methods for more details. The goal of the evader is then tointroduce modifications to the network structure (after the diffusion has taken place) in order to avoid beingidentified by the seeker as the source. We consider the two (as we argue below) most realistic types of suchmodifications: (1) adding nodes and (2) modifying edges.Next, we provide more details about the two types of modifications, starting with the one in which nodesare added to the network. For instance, these can be fake accounts created by the evader on a social mediaplatform. We refer to these as “bots” throughout the article, although they do not necessarily have to be botscontrolled or influenced by the evader, e.g., they could be the evader’s associates or fake accounts createdby the evader. Then, the problem faced by the evader is to determine the contacts of each bot. Thus,although the evader wishes to hide by adding nodes, the optimization problem faced by the evader is tochoose which edges, not nodes, to add to the network. Note that this is a variation of the well-known Sybilattack [13], where an entity affects a system by using multiple identities. We also consider an alternativeway in which the evader may conceal their true nature as the source of the diffusion. Instead of adding botsto the network, the evader can modify (i.e., add or remove) the network edges after the diffusion has takenplace. For instance, this could involve following certain accounts and unfollowing certain other accounts ina social media platform, hoping that such modifications would mislead source detection algorithms.Figure 1 illustrates an example of the hiding process. The network on the left represents the originalstructure, i.e., the one in which the diffusion takes place. As can be seen, the evader (represented as the rednode) is identified as the source of diffusion by both the Degree and the Closeness source detection algorithms(i.e., the evader occupies the top position in the rankings produced by both algorithms). The networks on theright illustrate two possible scenarios in which the evader hides its identity. The evader could avoid detectionby introducing newly infected nodes, as illustrated in the top-right network. After adding the two bots, theevader connects them to the nodes labeled C (i.e., having them share content posted by the nodes labeled C ). Consequently, the evader drops to the third position in the rankings produced by both the Degree andCloseness algorithms. Alternatively, the evader could try to avoid detection by removing the edges betweenthemselves and the two nodes labeled C as illustrated in the bottom-right network (a possible interpretationof this action is removing those two nodes from the evader’s friends on a social media site). As a result,they drop to the third position in the ranking produced by the Degree source detection algorithm, and tothe fifth position in the ranking produced by the Closeness source detection algorithm, thereby concealingtheir identity as the source of diffusion. As can be seen, by performing a relatively small number of networkmodifications, the evader can lower their chances of being identified as the source of diffusion.The first question we investigate is: How difficult is it to find an optimal way of hiding the source ofdiffusion?
Formal definitions of the decision problems faced by the evader are presented in Appendix A.2
BDA C EC D BA BC D EC DA B
Hiding byadding nodes H i d i n g b y m o d i f y i n g e d g e s DB B AE E
The evader Node infectedat t = 1 Node infectedat t = 2Inactive nodeNode addedby the evaderEdge removedby the evader D e g r ee D C l o s e n e ss D e g r ee C l o s e n e ss
56 8
ADC BCBBD D D e g r ee C l o s e n e ss ACE ACE DECA ABECA C D EC DA BB A C D EC DA BBA C D EC DA BB
Figure 1:
An overview of the hiding process.
The network on the left represents the original structurebefore the hiding process. Here, the red node is the evader, and the blue nodes are those infected in thefirst round of diffusion, i.e., at time step t = 1 , while the green nodes are those infected in t = 2 . Thegrey frame next to this network depicts the rankings computed by two source detection algorithms, namelyDegree and Closeness (nodes that have the same ranking were given the same label, e.g., see the two nodeslabeled C ). The numbers on the scale represent the score assigned by each algorithm, implying that the nodepositioned at the top is the most likely source according to that algorithm. The network in the top-rightcorner represents a possible scenario where the evader tries to hide by adding two bots, and connectingthem to the nodes labeled C . The network in the bottom-right corner represents another scenario where theevader tries to hide by removing the dotted edges. The gray frames next to these networks show how therankings change as a result of these modifications.Table 1 summarizes our theoretical findings regarding the computational complexity of these problems.The proofs of our theoretical results are presented in Appendix B. As can be seen, in almost all cases,the considered problems are NP-complete (Non-deterministic Polynomial-time complete), implying that noknown algorithm can solve them in polynomial time. Hence, finding an optimal way of preventing the evaderfrom being identified as the source of diffusion is a computationally intractable task that cannot be completedefficiently, especially for large networks. Given the computational complexity of identifying an optimal way of hiding the source of diffusion, we willnow focus on heuristic methods instead. The first class of heuristics that we consider is adding bots to thenetwork. We use the term “supporters” to describe the nodes already present in the network and willing toaccept connections from the bots. These supporters do not necessarily need to be the evader’s associates, andare not restricted to those who are intentionally cooperating to hide the source of diffusion. For example,they could be laypeople who are susceptible to accepting friend requests from strangers on social mediaplatforms. This is indeed plausible, since it has been shown that around half of Facebook users are willing toaccept friendship requests from strangers [28]. Then, to best conceal their identity, the evader must optimizethe list of contacts of each bot, which can include any of the supporters and any of the other bots. Letus first consider how contacts are chosen from the list of supporters. Assuming that the evader wishes toconnect each bot to k supporters, we consider three alternative heuristics:• Hub —for each bot, connect it to the k supporters with the greatest degrees (this way, all bots getconnected to the same k supporters); 3ource detection algorithm Adding nodes Modifying edgesDegree P NP-completeCloseness NP-complete NP-completeBetweenness NP-complete NP-completeRumor NP-complete NP-completeRandom Walk NP-complete NP-completeMonte Carlo NP-complete NP-completeTable 1: Summary of our computational complexity results.
For different source detection algorithms,we consider the decision problem that the evader must solve in order hide optimally from the algorithm byeither adding nodes or modifying edges. P = solvable in polynomial time; NP-complete = Non-deterministicPolynomial-time complete, implying that no known algorithm can solve it in polynomial time.•
Degree —for each bot, connect it to the k supporters with the greatest degree out of those who arenot yet connected to any other bot (if no such supporters exist, select from the ones connected to thesmallest number of bots);• Random —for each bot, connect it to k supporters chosen uniformly at random.Each of the above heuristics has two versions, depending on how the bots are connected to each other.In particular, we consider two possibilities:• Just supporters —every bot is connected only to supporters, implying that there are no edges betweenbots;•
Clique —every bot is connected to every other bot, implying that the bots form a clique.For each heuristic, we add the word “clique” to indicate that the bots are connected to each other, e.g., bywriting “Hub clique”. Otherwise, if there are no connections between the bots, we write the name as it is,e.g., “Hub”.Now that we have presented our first class of heuristics, which add bots to the network, let us nowconsider the second class of heuristics, which modify edges in the network. We assume that the evader canonly add or remove edges between themselves and a specific subset of nodes. To determine which of thosenodes to connect to, and which to disconnect from, we consider three alternative heuristics:•
Max degree —choose the nodes with the greatest degree;•
Min degree —choose the nodes with the smallest degree;•
Random —choose nodes uniformly at random.All ties are broken uniformly at random. Each of the above heuristics has two versions, depending on whetherthe evader is adding new connections, or removing existing ones. We write the word “adding” to indicatethat the evader is adding new connections, e.g., by writing “Adding max degree”. Otherwise, we write theword “removing” to indicate that the evader is removing existing connections, e.g., “Removing max degree”.
The experimental procedure for a given network G = ( V, E ) is as follows. First, we select the evader v † uniformly at random from the top of nodes according to degree ranking, provided that its degree isat least . Then, we spread the diffusion starting from v † to obtain the set of infected nodes I . In oursimulations, we use the SI model with the probability of diffusion being p = 0 . and the number of roundsbeing T = 5 . We then perform the hiding process using different heuristics, recording the position of v † inthe rankings generated by each source detection algorithm after each step of the hiding process.The majority of the source detection algorithms considered in our experiments disregard the nodesthat are not infected. Hence, when choosing the bots’ contacts, it makes sense for the evader to consider4 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 200010010 C h a n g e i n r a n k A dd m a x d e g r ee A dd m i n d e g r ee A dd R a n d o m D e l e t e m a x d e g r ee D e l e t e m i n d e g r ee D e l e t e R a n d o m D e g r ee H u b R a n d o m D e g r ee c l i q u e H u b c l i q u e R a n d o m c l i q u e A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 200010010 A v e r a g e r a n k BCA
Heuristic with negative change in evader’s rank B e f o r e h i d i n g A dd i n g n o d e s M o d i f y i n g e d g e s – –– – –– – –– – –– – ––– –– – –– – –– – ––– –– – ––– –– – –– – –– – ––– –– – –– – –– – –– – ––– –– – –– – –– – –– – ––– –– – –– – –– – –– – ––– –– – –– – –– – ––– –– – ––– –– – –– – –– – ––– –– – –– – –– – –– – ––– –– – –– – –– – –– – ––– –– – –– – –– – –– – ––– –– – –– – –– – ––– –– – ––– –– – –– – –– – ––– –– – –– – –– – –– – ––– –– – –– – –– – –– – –– – Figure 2:
The efficiency of hiding from the Eigenvector source detection algorithm in networkswith varying structure, size and density.
In each color-coded tessellation, the x-axis represents thenumber of nodes in the network, while the y-axis represents the average degree. The first column correspondsto scale-free networks generated using the Barabási-Albert (BA) model, the second corresponds to randomnetworks generated using the Erdős-Rényi (ER) model, and the third corresponds to small-world networksgenerated using the Watts-Strogatz (WS) model. The results are presented as an average over networksand over evaders in each network. The colors of the first row (A) gives the average position of the evaderin the ranking computed using the Eigenvector source detection algorithm before the hiding process on alogarithmic scale (the lower the ranking, the more exposed is the evader). The second (B) and third (C) rowsdepict the average difference in the evader’s ranking as a result of the hiding process, where each hexagonindicates the efficiency of six different hiding heuristics, and the color represents the effectiveness of the bestheuristic (on a logarithmic scale). More specifically, row (B) depicts the effect of adding bots connectedto supporters each, while row (C) depicts the effect of either adding or removing edges, depending onthe heuristic being used. Additionally, each triangular sector is filled up with a lighter shaded area (overlaidthe hexagons), representing the relative effectiveness of the corresponding heuristic compared to the bestone (the greater the shaded area, the better the heuristic). As such, triangular sectors that have no shadedarea have zero effectiveness. However, there are cases where the effectiveness is even less than zero, i.e., itbackfires and ends up exposing the evader even more. In such cases, the triangular sector has no shadedarea and is also marked with a minus.only infected supporters. In our simulations, we assume that all infected nodes other than the evader aresupporters. Furthermore, we assume that the evader can only remove edges that they are part of and onlyadd edges between themselves and their neighbors’ neighbors.In our experiments, we will first disentangle two different aspects of hiding. The first aspect relatesto the network topology itself, which can provide some concealment even without the evader manipulatingit. The second aspect comes from an evader strategically manipulating the network after the inception of5he diffusion process. To separate the two notions of hiding, we ran experiments on networks with varyingstructure, size, and density; see Figure 2. The figure presents the results for the Eigenvector source detectionalgorithm, in particular. We chose this example since it is one of the best-performing algorithms and yieldsthe most pronounced differences between the best and worst hiding heuristics; see Appendix C for othersource detection algorithms. Figure 2A presents the results for the first notion of hiding, i.e., the onestemming from the network structure itself, whereas Figures 2B and 2C present the results for the secondnotion of hiding, which results from strategically adding bots or modifying edges, respectively. As can beseen in Figure 2A, out of the three network structures considered in our experiments—scale-free, small-world, and random—the one that provides the greatest level of concealment to the evader is the scale-freestructure. Moreover, independent of the network model, the denser the network, the more concealed is theevader. As for the network size, having a larger number of nodes results in a greater level of concealmentfor scale-free networks, but results in a negligible effect for small-world or random networks. When it comesto strategic hiding via network manipulations, Figures 2B and 2C show that it is generally more efficientto strategically hide in networks with greater density. When comparing the different structures in termsof how they facilitate the strategic hiding, our heuristics are most efficient in scale-free networks, and leastefficient in small-world networks, regardless of whether the evader is adding bots, or modifying edges. Finally,commenting on how the network size affects the efficiency of strategic hiding, the effect is relatively small.The only exception is when hiding by modifying edges in scale-free networks, which is considerably moreeffective in larger networks. Next, we compare heuristics of the same type, starting with the ones that addbots, to determine whether they should create a clique amongst themselves or remain disconnected from oneanother, and determine which supporters to connect to which bots. As for the former question, creating aclique is consistently superior (see how the shaded area of the triangular sectors in Figure 2B is greater forheuristics with “clique” in their name). As for the question of which supporters to connect to which bots,when bots form a clique, it is more fruitful to connect bots to different supporters (using either the Randomclique or Degree clique heuristics) than connecting them all to the same supporters (using the Hub cliqueheuristic). On the other hand, when bots are disconnected from each other, the results vary depending onthe source detection algorithm being used; see Appendix C. Having compared the heuristics that add bots,we now compare the heuristics that modify (some of) the edges that are incident to the evader, to determinewhether we should add or remove edges. Our results indicate that the latter is significantly more effective.In fact, adding new edges often backfires, and ends up exposing the evader even more to the source detectionalgorithm. The only remaining question is to determine which edges to remove from the network. Ourresults show that the most effective choice is to disconnect the evader from the neighbors with the greatestdegrees, and the least effective choice is to disconnect from those with the lowest degrees. All the resultsin Figure 2 are shown after the heuristics have made all the modifications to the network. To see how theevader’s ranking changes after each such modification, see Appendixs D and E for the heuristics that addbots and modify edges, respectively.It is difficult to compare the effectiveness of adding nodes and modifying edges based solely on Figure 2,since the figure shows only the impact of adding nodes and modifying edges. To facilitate this com-parison, Figure 3 shows how many nodes must be added to the network in order to have the same effect asmodifying a single edge. As can be seen, in the majority of cases the effect of modifying a single edge isequivalent to adding several bots. In fact, the number of bots needed to achieve the same effect as modifyinga single edge is surprisingly large (and may even reach tens) in scale-free networks. The only exception is thecase of sparse, small-world networks, where adding one bot affects the evader’s ranking more than modifyinga single edge (as indicated by the values smaller than in the heatmap). The results depicted in the figureare for the Eigenvector source detection algorithm; the results for other source detection algorithms arequalitatively similar as shown in Appendix F.Another aspect that may impact the effectiveness of hiding the evader is the diffusion time, i.e., thetotal number of rounds completed in the diffusion process before the source detection algorithm analyzesthe network. The results of this analysis can be found in Figure 4. In the vast majority of cases, theevader becomes more hidden as diffusion time increases. This is true not only when the evader performsno modifications to the network, but also when they modify the network by adding bots or by removing6 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 101 E ff e c t i v e n e ss R e l a t i v e e ff e c t i v e n e ss Figure 3:
Comparing the effectiveness of adding bots vs. modifying edges when hiding fromthe Eigenvector source detection algorithm.
The effectiveness of modifying an edge is calculatedas the number of bots that must be added to achieve the same change in the evader’s ranking (assumingthat the modification of edges and the addition of bots are done using the best respective heuristics). Theheatmaps evaluate the effectiveness of modifying an edge while varying the network structure. In particular,the left-most heatmap corresponds to scale-free networks generated using the Barabási-Albert (BA) model,the central heatmap corresponds to random networks generated using the Erdős-Rényi (ER) model, and theright-most heatmap corresponds to small-world networks generated using the Watts-Strogatz (WS) model.In each heatmap, the x-axis represents the number of nodes in the network, while the y-axis represents theaverage degree. The color of each cell corresponds to the average effectiveness of modifying an edge. Thesize of the unshaded (darker) area in each cell corresponds to the average effectiveness of modifying an edgein that cell, relative to the maximum effectiveness across all cells of that panel. The results are presentedas an average over networks and over evaders in each network, after adding bots or modifying edges. The colors in the heat maps reflect a logarithmic scale.edges following the most effective heuristic (although the effectiveness of removing edges grows at a greaterrate than that of adding bots). This suggests that if our goal is to identify the source of diffusion, weshould start our investigation as early as possible. Shah et al. [31] reported similar findings, but for differentdiffusion models than ours, namely Susceptible-Infected-Recovered (SIR) and Susceptible-Exposed-Infected-Recovered (SEIR). Next, we compare the source detection algorithms to each other. As can be seen fromthe figure, the diffusion time’s sensitivity varies greatly from one algorithm to another. When the evaderperforms no modifications, the Degree and Betweenness algorithms prove to be the most resilient. Similarresults are observed when the evader adds bots to the network. In contrast, when modifying edges, the mostresilient algorithms are Rumor and Random Walk, i.e., they are the least affected by changing the diffusiontime. Interestingly, all three types of network structures show relatively similar patterns, suggesting thatthe source detection algorithms’ inner workings play a more important role in determining how the diffusiontime affects the effectiveness of hiding, rather than the network characteristics.So far in our analysis, we only considered networks of , nodes, since the analysis involved takingan average over a large number of cases, and increasing the number of nodes would have taken excessivetime. Fortunately, when it comes to evaluating the impact of the hiding process, we can approximate it evenfor massive networks. Based on this, we increased the number of nodes to , , and approximated theevader’s ranking after each step of the hiding process. The approximation is done by computing the rankingof the evader not among all nodes, but rather among , nodes, consisting of the , infected nodeswith the greatest degrees and another , infected nodes chosen uniformly at random from the remainingones. Furthermore, in our approximation we do not consider the Betweenness and Random walk sourcedetection algorithms, since their ranking cannot be efficiently computed for just a selected subset of nodes.The results of this analysis are presented in Figure 5. In networks generated using the Erdős-Rényi modeland the Watts-Strogatz model, the evader usually occupies the top position of the ranking before hiding,whereas in networks generated using the Barabási-Albert model, they are in the top positions. Moreover,in the former two types of networks, the hiding process seems much less effective than in the latter typeof networks, regardless of whether the hiding is done by adding nodes or by modifying edges. Moreover,7 egree Eigenvector Closeness Rumor Monte Carlo Random walk BetweennessBefore hiding After adding nodes After modifying edges1200400600800 Diffusion time10 20 30 E v a d e r ’ s r a n k Figure 4:
The impact of extending the diffusion time on the effectiveness of hiding.
In eachplot, the x-axis represents the diffusion time, i.e., the total number of rounds in the diffusion process. They-axis represents the evader’s ranking according to the source detection algorithm; this ranking is computedbefore the hiding process (green line), after running the best heuristic that adds bots (blue line), and afterrunning the best heuristic that modifies edges (red line). Each column corresponds to a different sourcedetection algorithm, while each row corresponds to a different model used to generate the network in whichthe diffusion takes place. Results are averaged over evaders and over networks generated using eitherthe Barabási-Albert (BA), the Erdős-Rényi (ER), or the Watts-Strogatz (WS) model, consisting of , nodes each, with the average degree being . The axes in all plots are identical to those used in the upper-leftcorner. Shaded areas (which are often too small to see) represent -confidence intervals.the effectiveness of hiding in these massive networks is considerably reduced compared to smaller networkswith , nodes, the results for which were presented in previous figures. Here, we only present results forthe best heuristic of each type; see Appendix G for the results of all heuristics. We analyze other, real-lifenetworks in the Appendix H, which all support our conclusions. In this work, we analyze the possibility of obfuscating the origin of diffusion, both as a result of spreadingit in a specific type of network structure, and via strategic network manipulations. On the one hand, ourtheoretical analysis indicates that finding an optimal way of hiding the source of diffusion, either by addingbots or by modifying the networks’ edges, is computationally intractable. On the other hand, our experimentsdemonstrate that even without any strategic manipulations, the structure of the network itself can greatlyhinder the efforts to identify “patient zero”. This seems to be the case especially in scale-free networks—anobservation that is particularly alarming since many real-life social networks exhibit this property. We alsofind that the task of identifying the source of diffusion is more challenging in networks that are dense, allowing8 E v a d e r ’ s r a n k Bots addedEdges changed E v a d e r ’ s r a n k E v a d e r ’ s r a n k E v a d e r ’ s r a n k E v a d e r ’ s r a n k E v a d e r ’ s r a n k Bots added Bots addedEdges changed Edges changed0 10 20 30 40 50120406080 5913 0 10 20 30 40 50 0 10 20 30 40 501 2 3 4 50 1 2 3 4 50 1 2 3 4 501 571714 1Degree Eigenvector Closeness Rumor Monte Carlo A dd i n g n o d e s M o d i f y i n g e d g e s Figure 5:
The effectiveness of hiding in massive networks.
Given networks of , nodes with anaverage degree of , the figure depicts the evader’s ranking (y-axis) as a function of the number of networkmodifications (x-axis), using the best heuristic for adding nodes (first row) or modifying edges (second row).Different colors represent different source detection algorithms. The results are presented as an average over networks generated using either the Barabási-Albert (BA), the Erdős-Rényi (ER), or the Watts-Strogatz(WS) model, and over different evaders in each network. Shaded areas represent -confidence intervals.the culprit to hide in the crowd. Moreover, a malevolent agent can utilize several network modificationsto obfuscate the source even more. Particularly effective strategies in this regard are based on attaching adensely connected group of bots to the network and removing connections between the source of the diffusionand its most well-connected neighbors after the diffusion has started. Our analysis also confirms that thelonger the diffusion takes, the more difficult it is to pinpoint its origin, highlighting the importance of promptreaction to any potential epidemic threat. Finally, our experiments indicate that finding the diffusion sourceis easier in massive, sparse networks, regardless of whether the evader strategically manipulates the networkto hide its identity. Still, given current source-detection algorithms, if the diffusion source tries to hide, itprobably will succeed. Future algorithms will need other types of information capturing the time evolutionof both the diffusion and the network.While early social bots were only able to retweet human-generated content, modern-day bots can engagein a growing range of interactions, such as conversing with people, commenting on their posts, answeringtheir questions, and gathering new followers [15]. Admittedly, such bots can have other goals than hiding thesource of diffusion, such as promoting products [23], spreading fake news [33], and influencing the results ofpolitical campaigns [5, 19]. Nevertheless, it was observed that bots are typically densely connected betweenthemselves, with relatively few connections to legitimate users [8, 36, 45], which is exactly the structurethat our analysis found to be the most effective in hiding the evader from source detection algorithms.Significant attention in the literature was given to detecting the bots that are embedded in a given socialnetwork [9, 45, 36, 44, 38, 37]. However, our work is different in nature, as our goal is not to identify bots,but rather the source of diffusion who might be using bots to avoid detection. There also exists a growingliterature on avoiding detection by a wide range of social network analysis tools. Such hiding techniquescan be used to prevent a closely-cooperating group of nodes from being identified by community detectionalgorithms [41], and prevent a leader of the organization from being recognized by centrality measures inboth standard [40, 43] and multilayer networks [39]. Similar techniques can be used to prevent an undisclosedrelationship from being pinpointed by link prediction algorithms [42, 46]. Nevertheless, none of the existingworks considered strategically hiding the source of diffusion from source detection algorithms. What is more,9hey typically only study hiding by adding edges to, and removing edges from, a network while disregardingthe possibility of avoiding detection by adding nodes to the network.The direct policy implication of our work is that one has to be sure the source has not been trying to hideitself to trust the source detection algorithms of today. However, our work also points to the future—thenecessary elements of the next generation’s source detection algorithms. These would need to be fed withnew types of information. Since hiding by manipulating edges is so efficient compared to adding fake nodes,new algorithms need to identify spurious links—connections added since the start of the diffusion. Arguably,the need for such algorithms is more pressing than ever. We can see this not only in the realm of malevolentinformation diffusion but also for epidemics, as evidenced by uncertainties surrounding the origin of theongoing COVID-19 pandemics. Developing tamper-proof source detection algorithms would improve ourchances of detecting patient zero when facing emergent outbreaks of infectious diseases in the future. Let us denote by G = ( V, E ) ∈ G a network, where V is the set of n nodes and E ⊆ V × V is the set of edges.We denote an edge between nodes v and w by ( v, w ) , and we only consider undirected networks, implyingthat we do not discern between edges ( v, w ) and ( w, v ) . Moreover, we assume that networks do not containself-loops, i.e., ∀ v ∈ V ( v, v ) / ∈ E . We denote by ¯ E the set of all non-edges, i.e., ¯ E = ( V × V ) \ (cid:0) E ∪ (cid:83) v ∈ V ( v, v ) (cid:1) .A path in a network G = ( V, E ) is an ordered sequence of distinct nodes, (cid:104) v , . . . , v k (cid:105) , in which every twoconsecutive nodes are connected by an edge in E . We consider the length of a path to be the number ofedges in that path. The set of all shortest paths between a pair of nodes, v, w ∈ V is denoted by Π G ( v, w ) ,while the distance between a pair of nodes v, w ∈ V , i.e., the length of a shortest path between them, isdenoted by d G ( v, w ) . Furthermore, a network is said to be connected if and only if there exists a pathbetween every pair of nodes in that network. We denote by N G ( v ) the set of neighbors of v in G , i.e., N G ( v ) = { w ∈ V : ( v, w ) ∈ E } . We denote by G V (cid:48) the subnetwork of G induced by the nodes in V (cid:48) ⊆ V ,i.e., G V (cid:48) = ( V (cid:48) , E ∩ ( V (cid:48) × V (cid:48) )) . Finally, for E (cid:48) ⊆ V × V we denote by G ∪ E (cid:48) the effect of adding set of edges E (cid:48) to G , i.e., G ∪ E (cid:48) = ( V, E ∪ E (cid:48) ) . To make the notation more readable, we will often omit the networkitself from the notation whenever it is clear from the context, e.g., by writing d ( v, w ) instead of d G ( v, w ) .This applies not only to the notation presented thus far, but rather to all notation in this article. In the Susceptible-Infected (SI) model, every node in the network is in one of two states: either susceptible(prone to be affected by the phenomenon) or infected (already affected by the phenomenon). The modeledprocess consists of discrete rounds. At the beginning of the process only the nodes belonging to the seed set are in the infected state (in this work, the seed set consists of only the evader v † ). In every round t , everyinfected node makes each of its susceptible neighbors infected with probability p . The process ends after acertain number of rounds T . We denote the set of infected nodes after T rounds by I .A source detection algorithm is a procedure that, based on the network G and the set of infected nodes I , aims at determining the source of diffusion. Every source detection algorithm considered in this work canbe represented as a function σ : V × G × V → R that assigns the score σ ( v, G, I ) to any node v , where thenode with the highest score is selected by the algorithm as the most probable source of diffusion. We willassume that for any node v / ∈ I and any source detection algorithm we have σ ( v, G, I ) = −∞ (as the seednode has to be infected and there is no mechanism of coming back to the susceptible state). In this work wefocus on the source detection algorithms that are designed to detect the source of a diffusion process witha seed set consisting of only one node (see Shelke and Attar [34] for a review of multiple source detectionalgorithms). More specifically, we consider the following source detection algorithms:• Degree [11]—the score assigned to a given v ∈ I is the degree centrality of v in G I , i.e.: σ degr ( v, G, I ) = | N G I ( v ) | ;
10 Closeness [11]—the score assigned to a given v ∈ I is the closeness centrality of v in G I , i.e.: σ clos ( v, G, I ) = 1 (cid:80) w ∈ I d G I ( v, w ) ; • Betweenness [11]—the score assigned to a given v ∈ I is the betweenness centrality of v in G I , i.e.: σ betw ( v, G, I ) = (cid:88) u (cid:54) = w : u,w ∈ I \{ v } |{ π ∈ Π G I ( u, w ) : v ∈ π }|| Π G I ( u, w ) | ; • Eigenvector [11]—the score assigned to a given v ∈ I is the eigenvector centrality of v in G I , i.e.: σ eig ( v, G, I ) = x v where x is the eigenvector corresponding to the largest eigenvalue of the adjacency matrix of G I ;• Rumor [32]—the score assigned to a given v ∈ I is the rumor centrality of v in G I , i.e.: σ rumor ( v, G, I ) = | I | ! (cid:81) w ∈ I Θ vw where Θ vw is the size of the subtree of w in the BFS (Breadth-First Search) tree of G I rooted at v ;• Random Walk [21]—intended to approximate diffusion by random walks. The score of a given node v is: σ rwalk ( v, G, I ) = (cid:40) φ ( v ) if ∀ w ∈ I d G ( v, w ) ≤ T otherwisewhere T is the number of rounds in the SI model and φ is defined as: φ t ( v ) = (cid:40) if t = T (1 − p ) φ t +1 ( v ) + (cid:80) w ∈ N ( v ) ∩ I p | N ( v ) | φ t +1 ( w ) otherwisewhere p is the probability of infection in the SI model.• Monte Carlo [2]—where for each node we repeated run a diffusion starting with that node and inves-tigate for which of the nodes the infected set is the most similar to I (using Jaccard similarity). Thescore of a given node v is: σ mcarlo ( v, G, I ) = 1 m m (cid:88) i =1 exp (cid:18) − ( ψ J ( I, I v,i ) − a (cid:19) where m is the number of Monte Carlo samples for each node, ψ J ( A, B ) = | A ∩ B || A ∪ B | is the Jaccardsimilarity measure, I v,i is the set of infected nodes in the i -th Monte Carlo sample where the diffusionstarts with v , and a is the soft margin parameter.11 eferences [1] N. K. Ahmed, F. Berchmans, J. Neville, and R. Kompella. Time-based sampling of social networkactivity graphs. In SIGKDD MLG , pages 1–9, 2010.[2] N. Antulov-Fantulin, A. Lančić, T. Šmuc, H. Štefančić, and M. Šikić. Identification of patient zero instatic and temporal networks: Robustness and limitations.
Phys. Rev. Lett. , 114(24):248701, 2015.[3] A.-L. Barabási.
Network Science . Cambridge University Press, Cambridge, 2016.[4] A. Barrat, M. Barthelemy, and A. Vespignani.
Dynamical Processes on Complex Networks . CambridgeUniversity Press, Cambridge, 2008.[5] A. Bessi and E. Ferrara. Social bots distort the 2016 US presidential election online discussion.
FirstMonday , 21(11-7), 2016.[6] P. Block, M. Hoffman, I. J. Raabe, J. B. Dowd, C. Rahal, R. Kashyap, and M. C. Mills. Social network-based distancing strategies to flatten the COVID-19 curve in a post-lockdown world.
Nat. Hum. Behav. ,page 588–596, 2020.[7] M. Boguná, R. Pastor-Satorras, A. Díaz-Guilera, and A. Arenas. Models of social networks based onsocial distance attachment.
Phys. Rev. E , 70(5):056122, 2004.[8] Q. Cao, M. Sirivianos, X. Yang, and T. Pregueiro. Aiding the detection of fake accounts in large scale so-cial online services. In
The 9th USENIX Symposium on Networked Systems Design and Implementation(NSDI 12) , pages 197–210, 2012.[9] Q. Cao, X. Yang, J. Yu, and C. Palow. Uncovering large groups of active malicious accounts in online so-cial networks. In
Proceedings of the 2014 ACM SIGSAC Conference on Computer and CommunicationsSecurity , pages 477–488, 2014.[10] W. A. Chiu, R. Fischer, and M. L. Ndeffo-Mbah. State-level needs for social distancing and contacttracing to contain COVID-19 in the United States.
Nat. Hum. Behav. , page 1080–1090, 2020.[11] C. H. Comin and L. da Fontoura Costa. Identifying the starting point of a spreading process in complexnetworks.
Phys. Rev. E , 84(5):056105, 2011.[12] Committee for the Coordination of Statistical Activities. How COVID-19 is changing the world: Astatistical perspective, volume 1. Technical report, UNICEF, 2020.[13] J. R. Douceur. The Sybil attack. In
International workshop on peer-to-peer systems , pages 251–260.Springer, 2002.[14] P. Erdős and T. Gallai. Graphs with prescribed degrees of vertices.
Mat. Lapok , 11:264–274, 1960.[15] E. Ferrara, O. Varol, C. Davis, F. Menczer, and A. Flammini. The rise of social bots.
Commun. ACM ,59(7):96–104, 2016.[16] W. Goffman and V. A. Newill. Generalization of epidemic theory: An application to the transmissionof ideas.
Nature , 204(4955):225–228, 1964.[17] S. L. Hakimi. On realizability of a set of integers as degrees of the vertices of a linear graph I.
J. Soc.Ind. Appl. Math. , 10(3):496–506, 1962.[18] V. Havel. A remark on the existence of finite graphs.
Casopis Pest. Mat. , 80:477–480, 1955.[19] P. N. Howard and B. Kollanyi. Bots,
Lancet , 395(10223):497 – 506, 2020.[21] A. Jain, V. Borkar, and D. Garg. Fast rumor source identification via random walks.
Soc. Netw. Anal.Min. , 6(1):62, 2016.[22] W. O. Kermack and A. G. McKendrick. A contribution to the mathematical theory of epidemics.
Proceedings of the royal society of london. Series A, Containing papers of a mathematical and physicalcharacter , 115(772):700–721, 1927.[23] K. Lee, B. D. Eoff, J. Caverlee, et al. Seven months with the devils: A long-term study of contentpolluters on twitter. In
In AAAI Intl. Conference on Weblogs and Social Media (ICWSM) , pages 185–192, 2011.[24] S. Lehmann and Y.-Y. Ahn.
Complex spreading phenomena in social systems: Influence and Contagionin Real-World Social Networks . Springer, Cham, 2018.[25] J. Leskovec and J. J. Mcauley. Learning to discover social circles in ego networks. In
Advances in neuralinformation processing systems , pages 539–547, 2012.[26] A. Lokhov, M. Mezard, H. Ohta, and L. Zdeborova. Inferring the origin of an epidemic with dynamicmessage-passing algorithm.
Phys. Rev. E , 90:012801, 03 2013.[27] D. Lu. The hunt to find the coronavirus pandemic’s patient zero.
New Scientist , 245(3276):9, 2020.[28] F. Nagle and L. Singh. Can friends be trusted? exploring privacy in online social networks. In , pages 312–315. IEEE,2009.[29] M. Ripeanu, A. Iamnitchi, and I. Foster. Mapping the gnutella network.
IEEE Internet Comput. ,6(1):50–57, 2002.[30] L. E. C. Rocha, F. Liljeros, and P. Holme. Information dynamics shape the sexual networks of internet-mediated prostitution.
Proc. Natl. Acad. Sci. USA , 107(13):5706–5711, 2010.[31] C. Shah, N. Dehmamy, N. Perra, M. Chinazzi, A.-L. Barabási, A. Vespignani, and R. Yu. Findingpatient zero: Learning contagion source with graph neural networks. arXiv 2006.11913, 2020.[32] D. Shah and T. Zaman. Rumors in a network: Who’s the culprit?
IEEE Trans. Inf. Theory , 57(8):5163–5181, 2011.[33] C. Shao, G. L. Ciampaglia, O. Varol, K.-C. Yang, A. Flammini, and F. Menczer. The spread of low-credibility content by social bots.
Nature Comm. , 9(1):4787, 2018.[34] S. Shelke and V. Attar. Source detection of rumor in social network: A review.
Online Soc. Netw.Media , 9:30–42, 2019.[35] B. Thomas, R. Jurdak, K. Zhao, and I. Atkinson. Diffusion in colocation contact networks: The impactof nodal spatiotemporal dynamics.
PLOS One , 11(8):e0152624, 2016.[36] B. Viswanath, A. Post, K. P. Gummadi, and A. Mislove. An analysis of social network-based Sybildefenses.
Comput. Comm. Rev. , 40(4):363–374, 2010.[37] G. Wang, T. Konolige, C. Wilson, X. Wang, H. Zheng, and B. Y. Zhao. You are how you click:Clickstream analysis for Sybil detection. In ,pages 241–256, 2013. 1338] G. Wang, M. Mohanlal, C. Wilson, X. Wang, M. Metzger, H. Zheng, and B. Y. Zhao. Social Turingtests: Crowdsourcing sybil detection. In
NDSS Symposium 2013 . Internet Society, 2013.[39] M. Waniek, T. Michalak, and T. Rahwan. Hiding in multilayer networks. In
Proceedings of the AAAIConference on Artificial Intelligence , volume 34, pages 1021–1028, 2020.[40] M. Waniek, T. P. Michalak, T. Rahwan, and M. Wooldridge. On the construction of covert networks. In
Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems , pages 1341–1349,2017.[41] M. Waniek, T. P. Michalak, M. J. Wooldridge, and T. Rahwan. Hiding individuals and communities ina social network.
Nat. Hum. Behav. , 2(2):139–147, 2018.[42] M. Waniek, K. Zhou, Y. Vorobeychik, E. Moro, T. P. Michalak, and T. Rahwan. How to hide one’srelationships from link prediction algorithms.
Sci. Rep. , 9(1):12208, 2019.[43] T. Wąs, M. Waniek, T. Rahwan, and T. Michalak. The manipulability of centrality measures-anaxiomatic approach. In
Proceedings of the 19th International Conference on Autonomous Agents andMultiAgent Systems , pages 1467–1475, 2020.[44] K.-C. Yang, O. Varol, C. A. Davis, E. Ferrara, A. Flammini, and F. Menczer. Arming the public withartificial intelligence to counter social bots.
Hum. Behav. Emerg. Tech. , 1(1):48–61, 2019.[45] H. Yu, M. Kaminsky, P. B. Gibbons, and A. Flaxman. Sybilguard: defending against Sybil attacks viasocial networks. In
Proceedings of the 2006 conference on Applications, technologies, architectures, andprotocols for computer communications , pages 267–278, 2006.[46] K. Zhou, T. P. Michalak, M. Waniek, T. Rahwan, and Y. Vorobeychik. Attacking similarity-basedlink prediction in social networks. In
Proceedings of the 18th International Conference on AutonomousAgents and Multi-Agent Systems (AAMAS) , page 305–313, 2019.14
Formal Definitions of the Decision Problems
We now formally define the computational problems faced by the evader. In what follows, let v, G, σ, I ) denote the ranking position of v among all nodes in G according to source detection algorithm σ when theset of infected nodes is I . More formally: v, G, σ, I ) = |{ w ∈ V : σ ( w, G, I ) > σ ( v, G, I ) }| . The goal of the evader is to hide by decreasing their position in the ranking produced by σ (notice thatdecreasing the position in the ranking corresponds to maximizing the value of v, G, σ, I ) ).The first method of hiding that we consider is to add bots to the network. Then, the problem facedby the evader is to determine the contacts of each bot. Since not every node in the network is necessarilywilling to accept connections from these bots, we define a subset of nodes, (cid:98) S ⊆ V , that would accept suchconnections. More formally, the problem can be defined as follows: Definition 1 (Hiding Source by Adding Nodes) . The problem is defined by a tuple, ( G, v † , I, σ, ω, b, ∇ , (cid:98) S ) ,where G = ( V, E ) is a network, v † ∈ V is the evader, I ⊆ V is the set of infected nodes, σ is a source detectionalgorithm, ω ∈ N is a safety threshold specifying the smallest ranking that the evader deems acceptable, b ∈ N is a budget specifying the maximum number of edges that can be added, ∇ is the set of bots to be added to thenetwork, and (cid:98) S ⊆ V is the set of nodes that the evader can connect to the bots. The goal is then to identifya set A ∗ ⊆ ( ∇ × ∇ ) ∪ ( ∇ × (cid:98) S ) such that | A ∗ | ≤ b , ( V ∪ ∇ , E ∪ A ∗ ) I is connected and: (cid:0) v † , ( V ∪ ∇ , E ∪ A ) , σ, I ∪ ∇ (cid:1) ≥ ω. If the algorithm σ is nondeterministic, then we require the above condition to be met for every possiblerealization of the algorithm. We also consider an alternative way in which the evader may conceal their true nature as the source ofthe diffusion. Instead of adding bots to the network, the evader can modify (i.e., add or remove) the networkedges after the diffusion has taken place. In this case, the problem faced by the evader can be defined asfollows:
Definition 2 (Hiding Source by Modifying Edges) . The problem is defined by a tuple, ( G, v † , I, σ, ω, b, (cid:98) A, (cid:98) R ) ,where G = ( V, E ) is a network, v † ∈ V is the evader, I ⊆ V is the set of infected nodes, σ is a source detectionalgorithm, ω ∈ N is a safety threshold specifying the smallest ranking that the evader deems acceptable, b ∈ N is a budget specifying the maximum number of edges that can be added or removed, (cid:98) A ⊆ ¯ E is the set of edgesthat can be added, and (cid:98) R ⊆ E is the set of edges that can be removed. The goal is then to identify two sets, A ∗ ⊆ (cid:98) A and R ∗ ⊆ (cid:98) R , such that | A ∗ | + | R ∗ | ≤ b , ( V, ( E ∪ A ∗ ) \ R ∗ ) I is connected and: (cid:0) v † , ( V, ( E ∪ A ) \ R ) , σ, I (cid:1) ≥ ω. If the algorithm σ is nondeterministic, then we require the above condition to be met for every possiblerealization of the algorithm. Proofs of the Computational Complexity Results
Table 2 summarizes our findings and refers to the theorem corresponding to each result.Source detection algorithm Modifying Edges Adding NodesDegree P (Theorem 1) NP-complete (Theorem 7)Closeness NP-complete (Theorem 2) NP-complete (Theorem 8)Betweenness NP-complete (Theorem 3) NP-complete (Theorem 9)Rumor NP-complete (Theorem 4) NP-complete (Theorem 10)Random Walk NP-complete (Theorem 5) NP-complete (Theorem 11)Monte Carlo NP-complete (Theorem 6) NP-complete (Theorem 12)Table 2: Summary of our computational complexity results. For different source detection algorithms, weconsider the decision problem that the evader must solve in order hide optimally from the algorithm byeither adding nodes or modifying edges. P = solvable in polynomial time; NP-complete = Non-deterministicPolynomial-time complete, implying that no known algorithm can solve it in polynomial time.
Theorem 1.
The problem of Hiding Source by Adding Nodes is in P given the Degree source detectionalgorithm. In particular, Algorithm 1 finds a solution to the given instance ( G, v † , I, σ degr , ω, b, ∇ , (cid:98) S ) of theproblem.Proof. We will analyze Algorithm 1 and show that it finds a solution to the instance ( G, v † , I, σ degr , ω, b, ∇ , (cid:98) S ) of the problem.In order for a given set of edges A to be a solution, we need to have at least ω infected nodes with degreesat least goal = | N G I ( v † ) | (the value computed in line 1). Notice that some infected nodes might alreadyhave the required degree, so we only need to increase the degrees of ω ∗ = ω − |{ v ∈ I : | N G I ( v ) | ≥ g }| (thevalue computed in line 2). If there are already at least ω infected nodes with degrees greater than v † , thesolution is the empty set, returned in line 3.Notice that by adding a set of edges A ∗ ⊆ ( ∇ × ∇ ) ∪ ( ∇ × ( (cid:98) S ∪ { v † } )) we can only increase degrees ofnodes in ∇ ∪ (cid:98) S ∪ { v † } . Since it is never beneficial to increase the degree of v † , a solution has to increasethe degree of at least ω ∗ nodes in ∇ ∪ (cid:98) S , that initially have lower degree than v † , to at least goal . In line 4we identify the set of nodes in (cid:98) S that have degrees lower than v † , and thus are candidates for satisfying thethreshold ω ∗ (notice that we already counted the nodes in (cid:98) S whose degrees are at least goal in line 2).In lines 6-31 we will compute a smallest set of edges A ∗ that needs to be added to G so that the threshold ω ∗ is satisfied by m nodes from S and ω ∗ − m nodes from ∇ , for every potential value of m (the loop inline 5). Notice that if ω ∗ > |∇| we need to increase the degree of at least ω ∗ − |∇| nodes in S , otherwise it ispossible to satisfy the threshold with just nodes in ∇ (see the expression max(0 , ω ∗ − |∇| ) in line 5). Noticealso that we never need to increase the degree of more than ω ∗ nodes in S (see the expression min( | S | , ω ∗ ) inline 5). If the said smallest set of edges A ∗ for a given m is within the evader’s budget (the test performedin line 32), we return it in line 33. Notice that if for every m the size of such smallest A ∗ is greater than thebudget, then there is no solution to the problem (the value ⊥ returned in line 34).Since all edges added to nodes in S (and increasing their degree) must connect them to nodes in ∇ ,increasing the degrees of nodes that already have high degrees will result in the smallest possible size of A ∗ (notice that if we were allowed to add edges between the nodes in S , we would need to take existing edgesin S × S into consideration). Hence, in line 6 we select the sequence S ∗ of m nodes from S that will counttowards satisfying the threshold ω ∗ as nodes with greatest degrees. Notice that if m = 0 then S ∗ is empty.There are two more conditions necessary for the existence of a solution A ∗ for a given m (both tested inline 7). Every one node s i ∈ S ∗ contributing to satisfying the threshold ω ∗ needs to be connected with atleast goal − | N G I ( s i ) | nodes from ∇ , and expression goal − | N G I ( s m ) | < |∇| in line 7 checks this conditionfor node s m , which needs the greatest number of connections. Notice that if m = 0 then there is no need tocheck this condition. The second condition is that every node from ∇ contributing to satisfying the threshold16 lgorithm 1 Finding an optimal solution for the Hiding Source by Adding Nodes problem given the Degreesource detection algorithm.
Input:
Network G = ( V, E ) , evader v † ∈ V , set of infected nodes I ⊆ V , safety threshold ω ∈ N , budget b ∈ N , setof evader-controlled nodes ∇ , set of supporters (cid:98) S ⊆ V . Output:
Solution A ∗ to instance ( G, v † , I, σ degr , ω, b, ∇ , (cid:98) S ) of the Hiding Source by Adding Nodes problem or ⊥ ifthere is no solution. goal ← | N G I ( v † ) | + 1 ω ∗ ← ω − |{ v ∈ I : | N G I ( v ) | ≥ g }| if ω ∗ ≤ then return ∅ S ← { v ∈ (cid:98) S : | N G I ( v ) | < goal } for m ← max(0 , ω ∗ − |∇| ) , . . . , min( | S | , ω ∗ ) do S ∗ ← (cid:104) s i (cid:105) mi =1 such that s i is i -th node from S in order of non-increasing | N G I ( s i ) | if ( m = 0 ∨ goal − | N G I ( s m ) | < |∇| ) ∧ (cid:16) m = ω ∗ ∨ | (cid:98) S | + |∇| > goal (cid:17) then A ∗ ← ∅ ∇ ∗ ← (cid:104) δ i (cid:105) ω ∗ − mi =1 such that ∀ i δ i ∈ ∇ and ∀ i (cid:54) = j δ i (cid:54) = δ j if m > then j ∗ ← for s i ∈ S ∗ do x i ← goal − | N G I ∪ A ∗ ( s i ) | if x i < |∇ ∗ | then while x i > do A ∗ ← A ∗ ∪ { ( s i , δ j ∗ ) } j ∗ ← ( j ∗ mod |∇ ∗ | ) + 1 x i ← x i − else A ∗ ← A ∗ ∪ ( { s i } × ∇ ∗ ) ∪ select ( x i − |∇ ∗ | , ∇ \ ∇ ∗ ) δ ∗ ← arg max δ i ∈∇ ∗ ( goal − | N G I ∪ A ∗ ( δ i ) | ) if goal − | N G I ∪ A ∗ ( δ ∗ ) | ≥ |∇ ∗ | then for δ i ∈ ∇ ∗ do if goal − | N G I ∪ A ∗ ( δ i ) | ≥ |∇ ∗ | then A ∗ ← A ∗ ∪ (cid:16) { δ i } × select (cid:16) goal − | N G I ∪ A ∗ ( δ i ) | − |∇ ∗ | + 1 , ( (cid:98) S ∪ ∇ ) \ ( ∇ ∗ ∪ N G I ∪ A ∗ ( δ i )) (cid:17)(cid:17) else if (cid:80) δ i ∈∇ ∗ ( goal − | N G I ∪ A ∗ ( δ i ) | ) mod 2 = 1 then A ∗ ← A ∗ ∪ (cid:16) { δ ∗ } × select (1 , ( (cid:98) S ∪ ∇ ) \ ( ∇ ∗ ∪ N G I ∪ A ∗ ( δ ∗ ))) (cid:17) Connect nodes in ∇ ∗ into a network such that degree of δ i is g −| N G I ∪ A ∗ ( δ i ) | using Havel-Hakimi algorithm if m = 0 and G I ∪ A ∗ disconnected then A ∗ ← A ∗ ∪ (cid:16) { δ } × select (1 , (cid:98) S \ N G I ∪ A ∗ ) (cid:17) if | A ∗ | ≤ b then return A ∗ return ⊥ ω ∗ needs to be connected with at least goal nodes from ∇ ∪ (cid:98) S (as initially its degree is ), and the expression | (cid:98) S | + |∇| > goal in line 7 checks this condition. Notice that if m = ω ∗ then there is no need to check thiscondition.In line 8 we initialize the solution A ∗ (as we are now sure it exists), whereas in line 9 we select thesequence ∇ ∗ of ω ∗ − m nodes from ∇ that will count towards satisfying the threshold ω ∗ . Notice that if m = ω ∗ then ∇ ∗ is empty.In lines 10-20 we increase the degree of all nodes in § ∗ by connecting them primarily with nodes in ∇ ∗ (aseither way we need to increase their degrees and this way we obtain the smallest size of A ∗ ), and then, if thereare not enough nodes in ∇ ∗ , with other nodes from ∇ . Let y i be the number of additional edges we need to17onnect to δ i ∈ ∇ ∗ to increase its degree to goal after executing lines 10-20, i.e., y i = goal − | N G I ∪ A ∗ ( δ i ) | .Notice that because of the way we distribute the connections with S ∗ among the nodes in ∇ ∗ , we have that ∀ δ i ,δ j ∈∇ ∗ | y i − y j | ≤ (see Figure 6). 𝜹𝜹 … goal 𝜹𝜹 𝜹𝜹 𝜹𝜹 |𝛁𝛁 ∗ | stubs connected to nodes from 𝛻𝛻 ∗ stubs connected to nodes from 𝛻𝛻 ∗ if there are enough of them, and from 𝛻𝛻 ∪ ̂𝑆𝑆 otherwisestubs connected to nodes from 𝑆𝑆 ∗ 𝑦𝑦 𝑦𝑦 |∇ ∗ | 𝑦𝑦 𝑦𝑦 Figure 6: Distribution of stubs for nodes in ∇ ∗ in the proof of Theorem 1.Without loss of generality, assume that y i > y j + 1 , i.e., δ j is connected with at least two more nodesfrom S ∗ than δ i . We can disconnect δ j with any v ∈ (cid:98) S among its neighbors not connected to δ i (noticethat as y i > y j , there must exist at least one such node), and instead connect v with δ i . We decreased thedifference between y i and y j by one. If by performing this operation we decreased the degree of δ j below goal , we should disconnect one of the neighbors of δ i outside S ∗ not connected to δ j (again, from y i > y j itfollows that there exists at least one such node), and instead connect it to δ j , thus ensuring the correctnessof the solution. By repeating this operation we can decrease the maximal difference between any y i and any y j to . Hence, if there exists a minimal size solution such that ∃ δ i ,δ j ∈∇ ∗ y i − y j > then there also existsthe same size solution such that ∀ δ i ,δ j ∈∇ ∗ | y i − y j | ≤ .After increasing the degrees of all nodes in S ∗ to at least goal , we now need to increase the degrees of thenodes in ∇ ∗ . Again, to obtain the smallest possible size of A ∗ , we will add as many edges as possible from ∇ ∗ × ∇ ∗ , as opposed to between members of ∇ ∗ and nodes from outside ∇ ∗ . Let us denote the minimalnumber of necessary new connections among the members of ∇ ∗ by z . In line 21 we identify δ ∗ as themember of ∇ ∗ that needs the greatest number of connections to be added to it (i.e., with the greatest valueof y i ). Notice that because of the way we constructed the set A ∗ thus far, all other nodes in ∇ ∗ need exactlyas many new connections as δ ∗ , or at most one less. Hence, either all nodes in ∇ ∗ need exactly z newconnections, or some of them need z new connection, while others (including δ ∗ ) need z + 1 new connections(all considered at the moment of executing line 21).In line 22 we check whether the maximal number of required new edges is greater than the size of ∇ ∗ .If that is the case, it is inevitable to connect some nodes in ∇ ∗ with nodes from outside of ∇ ∗ , which wedo in lines 23-25. Notice that the choice of nodes from outside of ∇ ∗ does not matter (as either way onlyone end of the edge will contribute to satisfying the threshold ω ∗ ), so we use the function select ( k, X ) thatselects k elements from the set X . Let us also assume that it prefers members of (cid:98) S . Notice also that afterthis operation all nodes in ∇ ∗ will need exactly |∇ ∗ | − new connections.Moreover, the sum of degrees in a network induced by ∇ ∗ has to be even, hence in lines 27-28 we addadditional edge with one end in ∇ ∗ if necessary. Notice that if we executed lines 23-25 then the sum ofdegrees is guaranteed to be even (as it is |∇ ∗ | ( |∇ ∗ | − ). Notice also that since we add this edge to δ ∗ , itis still true that either all nodes in ∇ ∗ need z more connections or some of them need z , while others need z + 1 connections.In line 29 we connect the nodes in ∇ ∗ into a network that finally satisfies the threshold ω ∗ . We do it18sing the Havel-Hakimi algorithm [18, 17], which connects a given set of nodes into a network with a givensequence of degrees if it is possible.We will now show that this is indeed possible. We will do so using the Erdős-Gallai theorem [14], whichstates that a given sequence d ≥ . . . ≥ d n can be realized as a network if an only if (cid:80) ni =1 d i is even and: ∀ ≤ k ≤ n k (cid:88) i =1 d i ≤ k ( k −
1) + n (cid:88) i = k +1 min( d i , k ) . (1)As argued above, the sum of the number of new connections required by nodes in ∇ ∗ to satisfy the threshold ω ∗ is even. Let n denote the size of |∇ ∗ | , and let m denote the number of nodes in ∇ ∗ that require z + 1 new connections (notice that ≤ m < n ). The sequence of the degrees is such that d i = z + 1 if i ≤ m and d i = z otherwise. We can assume that z < n − , as otherwise all nodes in ∇ ∗ need exactly z = n − newconnections (as we executed lines 22-25 before), and the sequence of degrees can be realized by connectingnodes in ∇ ∗ into a clique. In what follows let L denote the left hand side of equation 1, and let R denotethe right hand side of equation 1. We will now show that Equation 1 holds for nodes in ∇ ∗ , by performingcalculations for four different cases:• Case I k ≥ m ∧ z ≥ k : L = ( z + 1) m + ( k − m ) z = kz + m R = k ( k −
1) + ( n − k ) k = kn − k R − L = kn − k − kz − m ≥ k ( z + 2) − k − kz = 0; • Case II k ≥ m ∧ z < k : L = ( z + 1) m + ( k − m ) z = kz + m R = k ( k −
1) + ( n − k ) z = k + nz − k − kz R − L = k + nz − k − kz − m ≥ k ( z + 2) + nz − k − kz = ( n − k ) z ≥ • Case III k < m ∧ z ≥ k : L = k ( z + 1) = kz + k R = k ( k −
1) + ( n − k ) k = nk − k R − L = nk − kz − k = k ( n − ( z + 2)) ≥ • Case IV k < m ∧ z < k : L = k ( z + 1) = kz + k R = k ( k −
1) + ( m − k )( z + 1) + ( n − m ) z = k + m + nz − k − kz R − L = k + m + nz − k − kz = k + m − k + nz − k ( z + 1) + ( z + 1) − ( z + 1) ≥ ( k − z − + ( m − k ) + ( z + 2) z − ( z + 1) = ( k − z − + ( m − k ) − ≥ − . Finally, notice that if m = 0 and so far we only added edges between the members of ∇ ∗ , we need toconnect them to the rest of the network, which we do in lines 30-31. Thanks to our assumption that thefunction select prioritize nodes in (cid:98) S , if we added at least one edge between a member of ∇ ∗ and a node fromoutside ∇ ∗ then the network is already connected. Theorem 2.
The problem of Hiding Source by Adding Nodes is NP-complete given the Closeness sourcedetection algorithm. roof. The problem is trivially in NP, since after adding a given set of edges A ∗ , it is possible to computethe closeness centrality ranking of all nodes in G I in polynomial time.We will now prove that the problem is NP-hard. To this end, we will show a reduction from the NP-complete Dominating Set problem. The decision version of this problem is defined by a network, H =( V (cid:48) , E (cid:48) ) , where V (cid:48) = { v , . . . , v n } , and a constant k ∈ N , where the goal is to determine whether there exist V ∗ ⊆ V (cid:48) such that | V ∗ | = k and every node outside V ∗ has at least one neighbor in V ∗ , i.e., ∀ v ∈ V (cid:48) \ V ∗ N H ( v ) ∩ V ∗ (cid:54) = ∅ .Let ( H, k ) be a given instance of the Dominating Set problem. We will now construct an instance of theHiding Source by Adding Nodes problem.First, let us construct a network G = ( V, E ) where:• V = V (cid:48) ∪ { v † , x, u, w, a , a , a } ∪ (cid:83) n + k − i =1 { y i } ,• E = E (cid:48) ∪ { ( v † , x ) , ( x, u ) , ( u, w ) , ( w, a ) , ( w, a ) , ( w, a ) } ∪ (cid:83) n + k − i =1 { ( u, y i ) } ∪ (cid:83) ni =1 { ( w, v i ) } .An example of the construction of the network G is presented in Figure 7. 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝜹𝜹 𝐻𝐻 𝐺𝐺 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒖𝒖 𝒘𝒘𝒂𝒂 𝒂𝒂 𝒂𝒂 𝒙𝒙𝒗𝒗 † 𝒚𝒚 𝒚𝒚 … Figure 7: Construction of the network used in the proof of Theorem 2. Green dotted edges are allowed tobe added.Now, consider the instance ( G, v † , I, σ, ω, b, ∇ , (cid:98) S ) of the Hiding Source by Adding Nodes problem, where:• G is the network we just constructed,• v † is the evader,• I = V ,• σ is the Closeness source detection algorithm,• ω = 3 n + k + 6 is the safety threshold,• b = k , where k is the size of the dominating set from the Dominating Set problem instance,• ∇ = { δ } ,• (cid:98) S = V (cid:48) , i.e., additional node can only be connected with the nodes in V (cid:48) .First, let us analyze the closeness centrality values of the nodes in G after the addition of any A ⊆ ∇ × (cid:98) S .Let D v denote the sum of distances from v to all nodes in the network, i.e., D v = (cid:80) w ∈ V d ( v, w ) . Notice that σ clos ( v, G, I ) = (cid:80) w ∈ I D v , which implies that a greater value of D v leads to a lower position of the ranking ofnodes according to the Closeness source detection algorithm. Moreover, let z A denote the sum of distancebetween δ and members of V (cid:48) after the addition of A . Table 3 presents the computation of D v for everynode v ∈ V ∪ ∇ after the addition of a given A ⊆ ∇ × (cid:98) S .Since the safety threshold is ω = 3 n + k + 6 , all other nodes (including δ ) must have greater closenesscentrality than v † after the addition of a given A in order for the said A to be a solution to the constructed20 d ( v, v † ) d ( v, x ) d ( v, u ) d ( v, w ) (cid:80) j d ( v, y j ) (cid:80) j d ( v, a j ) (cid:80) j d ( v, v j ) d ( v, δ ) D v v † n + k −
1) 12 4 n n + 3 k + 20 x n + k −
1) 9 3 n n + 2 k + 15 u n + k − n n + k + 12 w n + k −
1) 3 n n + 2 k + 9 y i n + k −
2) 9 3 n n + 2 k + 17 a i n + k −
1) 4 2 n n + 3 k + 14 v i n + k −
1) 6 ≤ n − ≤ ≤ n + 3 k + 14 δ n + k −
1) 9 z A n + 4 k + 19 + z A Table 3: Sums of distances between nodes of the network after the addition of A ⊆ ∇ × (cid:98) S , used in the proofof Theorem 2.instance of the problem of Hiding Source by Adding Nodes. Notice that after adding any A to the networkwe have D v < D v † for any v ∈ V \ { v † , δ } (based on the formulas for D v in Table 3). Hence, a given A is asolution to the constructed instance of the Hiding Source by Adding Nodes problem if and only if we have D δ < D v † after the addition of A .Let us now analyze the value of D δ after the addition of a given A . We have that: D δ = 8 n + 4 k + 19 + z A = 8 n + 4 k + 19 + | A | + 2 | V (cid:48) A | + 3( n − | A | − | V (cid:48) A | ) = 11 n + 4 k + 19 − | A | − | V (cid:48) A | where V (cid:48) A = { v i ∈ V (cid:48) \ N ( δ ) : N ( v i ) ∩ N ( δ ) (cid:54) = ∅} . Notice we have that | V (cid:48) A | ≤ n − | A | , which gives us: D δ ≥ n + 4 k + 19 − | A | − ( n − | A | ) = 10 n + 4 k + 19 − | A | . Hence, given that D v † = 10 n + 3 k + 20 , we have that D δ < D v † if and only if | A | = k and | | V (cid:48) A | = n − k , i.e., δ is connected with k nodes in V (cid:48) and every other node in V (cid:48) has a neighbor who is connected with δ .We will now show that the constructed instance of the Hiding Source by Adding Nodes problem has asolution if and only if the given instance of the Dominating Set problem has a solution.Assume that there exists a solution to the given instance of the Dominating Set problem, i.e., a subset V ∗ ⊆ V (cid:48) of size k such that all other nodes have a neighbor in V ∗ . After adding to G the set A ∗ = { δ } × V ∗ we have that | A ∗ | = k and every node in V (cid:48) \ N ( δ ) has a neighbor who is connected with δ . We showedthat if there exists a solution to the given instance of the Dominating Set problem, then there also exists asolution to the constructed instance of the Hiding Source by Adding Nodes problem.Assume that there exists a solution A ∗ to the constructed instance of the Hiding Source by ModifyingEdges problem. As shown above, we must have | A ∗ | = k and every node in V (cid:48) \ N ( δ ) has a neighbor who isconnected with δ . Therefore { v i ∈ V (cid:48) : ( δ, v i ) ∈ A ∗ } is a dominating set in H of size exactly k . We showedthat if there exists a solution to the constructed instance of the Hiding Source by Adding Nodes problem,then there also exists a solution to the given instance of the Dominating Set problem.This concludes the proof. Theorem 3.
The problem of Hiding source by Adding Nodes is NP-complete given the Betweenness sourcedetection algorithm.Proof.
The problem is trivially in NP, since after adding a given set of edges A ∗ , it is possible to computethe betweenness centrality ranking of all nodes in G I in polynomial time.We will now prove that the problem is NP-hard. To this end, we will show a reduction from theNP-complete Finding k -Clique problem. The decision version of this problem is defined by a network, H = ( V (cid:48) , E (cid:48) ) , where V (cid:48) = { v , . . . , v n } , and a constant k ∈ N , where the goal is to determine whether thereexist k nodes forming a clique in H .Let ( H, k ) be a given instance of the Finding k -Clique problem. Let us assume that k ≥ , all otherinstances can be easily solved in polynomial time. We will now construct an instance of the Hiding Sourceby Modifying Edges problem.First, let us construct a network G = ( V, E ) where:21 V = V (cid:48) ∪ { v † , δ, u, w } ∪ (cid:83) v i ,v j ∈ V (cid:48) :( v i ,v j ) / ∈ E (cid:48) { ¯ e i,j } ∪ (cid:83) ki =1 { x i } ∪ (cid:83) k i =1 { y i } ,• E = { w }× ( V \{ δ } ) ∪ (cid:83) ¯ e i,j ∈ V { (¯ e i,j , v i ) , (¯ e i,j , v j ) }∪ (cid:83) y i ,y j ∈ V { ( y i , y j ) }∪ (cid:83) ki =1 { ( v † , x i ) }∪{ ( u, x ) , ( u, x ) } .In what follows we denote the set of nodes x , . . . , x k by X , and the set of nodes y , . . . , y k by Y . Noticethat a node ¯ e i,j exists in V if and only if v i , v j are not connected in H . An example of the construction ofthe network G is presented in Figure 8. … 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝜹𝜹 𝐻𝐻 𝐺𝐺 𝒘𝒘�𝒆𝒆 , �𝒆𝒆 , 𝒙𝒙 𝒙𝒙 𝒙𝒙 𝒗𝒗 † 𝒖𝒖𝒚𝒚 𝒚𝒚 𝒚𝒚 𝒚𝒚 Figure 8: Construction of the network used in the proof of Theorem 3. Green dotted edges are allowed tobe added. Edges incident with w are printed grey for better readability.Now, consider the instance ( G, v † , I, σ, ω, b, ∇ , (cid:98) S ) of the Hiding Source by Adding Nodes problem, where:• G is the network we just constructed,• v † is the evader,• I = V ,• σ is the Betweenness source detection algorithm,• ω = k + 2 is the safety threshold,• b = k is the budget of the evader,• ∇ = { δ } ,• (cid:98) S = V (cid:48) , i.e., additional node can only be connected with the nodes in V (cid:48) .Let C v denote the value of σ betw ( v, G, I ) , i.e., the value used to compute determining the source ofdiffusion. To remind the reader: C v = (cid:88) v (cid:48) (cid:54) = v (cid:48)(cid:48) : v (cid:48) ,v (cid:48)(cid:48) ∈ I \{ v } |{ π ∈ Π( v (cid:48) , v (cid:48)(cid:48) ) : v ∈ π }|| Π( v (cid:48) , v (cid:48)(cid:48) ) | where Π( v (cid:48) , v (cid:48)(cid:48) ) is the set of shortest paths between the nodes v (cid:48) and v (cid:48)(cid:48) .We will now make the following observations about the values of C v in G after adding an arbitrary A ⊆ { δ } × (cid:98) S :• C v † = k ( k − − , as it controls one of three shortest paths between x and x (the others beingcontrolled by w and u ), and one of two shortest paths between all other pair of nodes in X (the otherbeing controlled by w ),• C u = , as it controls one of three shortest paths between x and x (the others being controlled by w and v † ), 22 C x i = 0 , as it does not control any shortest paths,• C a i = 0 , as it does not control any shortest paths,• C w ≥ k ( k + n + 4) , as it controls all shortest paths between nodes in Y and all other nodes,• C ¯ e i,j ≤ , as it controls one of at least two shortest paths between v i and v j (the other being controlledby w ),• C v i = 0 if v i is not connected with δ , as it does not control any shortest paths,• C v i ≥ k | A | if v i is connected with δ , as it controls one of | A | paths between δ and nodes in Y (the othersbeing controlled by other nodes in V (cid:48) connected to δ ),• C δ = z A + (cid:16) | A | ( | A |− − z A (cid:17) , where z A is the number of pairs v i , v j ∈ V (cid:48) connected with δ suchthat ¯ e i,j ∈ E , as δ controls one of three shortest paths between such pairs of v i , v j ∈ V (cid:48) (the othersbeing controlled by w and ¯ e i,j ), while δ controls one of two shortest paths between all other pairs of v i , v j ∈ V (cid:48) that it is connected two (the other being controlled by w ).Notice that we have C v † < k , and since we assumed that k ≥ we also have C v † ≥ . Hence, the onlynodes that can have greater value of C v (and higher position in the source detection algorithm ranking) are w , δ and nodes in V (cid:48) connected to δ . Since the safety threshold is ω = k + 2 and the evader’s budget is b = k , it implies that δ must be connected with exactly k nodes in V (cid:48) . Notice also that w and nodes in V (cid:48) connected to δ have greater betweenness centrality than v † no matter the choice of A . Hence, a given set A is a solution to the constructed instance of the Hiding Source by Adding Nodes problem if and only if δ isconnected with exactly k nodes from V (cid:48) and δ has greater betweenness centrality than v † .Let us now analyze the betweenness centrality of δ when it is connected with k nodes in V (cid:48) (in whichcase | A | = k ): C δ = z A
13 + (cid:18) | A | ( | A | − − z A (cid:19)
12 = k ( k − − z A , where z A is the number of pairs v i , v j ∈ V (cid:48) connected with δ such that ¯ e i,j ∈ E . Notice that if z A = 0 then C δ = k ( k − > C v † . However, if z A ≥ then C δ ≤ k ( k − − = C v † . Notice that z A = 0 only when thenodes connected with δ form a clique in H (as node ¯ e i,j ∈ E is added to V only when nodes v i and v j are notneighbors in H ). Hence, δ has greater betweenness centrality than v † if and only if nodes in V (cid:48) connectedwith δ form a clique in H .We will now show that the constructed instance of the Hiding Source by Adding Nodes problem has asolution if and only if the given instance of the Finding k -Clique problem has a solution.Assume that there exists a solution to the given instance of the Finding k -Clique problem, i.e., a subset V ∗ ⊆ V (cid:48) forming a k -clique in H . Notice that for A ∗ = { δ } × V ∗ we have | A ∗ | = k and z A ∗ = 0 . We showedthat if there exists a solution to the given instance of the Finding k -Clique problem, then there also exists asolution to the constructed instance of the Hiding Source by Adding Nodes problem.Assume that there exists a solution A ∗ to the constructed instance of the Hiding Source by ModifyingEdges problem. As observed above, we must have | A ∗ | = k and the nodes that δ is connected with, i.e.,nodes V ∗ = { v i ∈ V (cid:48) : ( δ, v i ) ∈ A ∗ } , must form a clique in H . We showed that if there exists a solution tothe constructed instance of the Hiding Source by Adding Nodes problem, then there also exists a solution tothe given instance of the Finding k -Clique problem.This concludes the proof. Theorem 4.
The problem of Hiding source by Adding Nodes is NP-complete given the Rumor source detec-tion algorithm.Proof.
The problem is trivially in NP, since after adding a given set of edges A ∗ , it is possible to computethe rumor centrality ranking of all nodes in G I in polynomial time.23e will now prove that the problem is NP-hard. To this end, we will show a reduction from theNP-complete Exact 3-Set Cover problem. The decision version of this problem is defined by a universe, U = { u , . . . , u k } , and a collection of sets S = { S , . . . , S | S | } such that ∀ i S i ⊂ U and ∀ i | S i | = 3 , where thegoal is to determine whether there exist k elements of S the union of which equals U .Let ( U, S ) be a given instance of the Exact 3-Set Cover problem. Assume that k ≥ , all other instancescan be easily solved in polynomial time. We will now construct an instance of the Hiding Source by AddingNodes problem.First, let us construct a network G = ( V, E ) where:• V = { v † , w, x, a , a } ∪ U ∪ S ∪ (cid:83) | S | i =1 { y i } ∪ (cid:83) ki =1 { Q i } ∪ (cid:83) ki =1 { z i } ,• E = { ( w, x ) , ( w, v † ) , ( x, a ) , ( x, a ) }∪ ( { w } × ( Y ∪ S ∪ Q ∪ Z )) ∪ ( Z × U ) ∪ (cid:83) u i ∈ S j { ( u i , S j ) }∪ (cid:83) u i { ( u i , Q (cid:100) i (cid:101) ) } .We denote the set of nodes { a , a } by A , the set of nodes { y , . . . , y | S | } by Y , the set of nodes { z , . . . , z k } by Z , and the set of nodes { Q , . . . , Q k } by Q . An example of the construction of the network G is presentedin Figure 9. 𝜹𝜹 𝒘𝒘 𝐺𝐺 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝑆𝑆 𝑆𝑆 𝑆𝑆 𝒗𝒗 † 𝒂𝒂 𝒂𝒂 𝒙𝒙 𝒚𝒚 𝒚𝒚 | 𝑺𝑺 | … 𝒛𝒛 𝒛𝒛 𝒛𝒛 𝒛𝒛 𝒛𝒛 𝒛𝒛 𝑺𝑺 𝑺𝑺 𝑺𝑺 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝑸𝑸 𝑸𝑸 Figure 9: Construction of the network used in the proof of Theorem 4. Green dotted edges are allowed tobe added.Now, consider the instance ( G, v † , I, σ, ω, b, ∇ , (cid:98) S ) of the Hiding Source by Adding Nodes problem, where:• G is the network we just constructed,• v † is the evader,• I = V , i.e., all nodes in G are infected,• σ is the Rumor source detection algorithm,• ω = 1 ,• b = k + 2 ,• ∇ = { δ } ,• (cid:98) S = { w, x } ∪ S . 24o remind the reader, the score assigned to a given node by the Rumor source detection algorithm is σ rumor ( v, G, I ) = | I | ! (cid:81) w ∈ I Θ vw where Θ vw is the size of the subtree of w in the BFS tree of G I rooted at v .Let C v ( A ) denote (cid:81) w ∈ I \{ v } Θ vw in G after the addition of A . Notice that greater C v ( A ) implies lower σ rumor ( v, G ∪ A, I ) and vice versa, as we have σ rumor ( v, G ∪ A, I ) = | I | ! | V | C v ( A ) .Notice that a BFS tree of a given node can be constructed in many different ways, which makes theRumor source detection algorithm nondeterministic.First, let us compute the value of C v † ( A ) . Figure 10 presents the BFS tree of v † . The location of v † inthe tree depends on the connections included in A , the three cases are:• if ( δ, w ) ∈ A then δ is in location 1,• if ( δ, w ) / ∈ A ∧ ( δ, x ) ∈ A then δ is either in location 2 or in location 3,• if ( δ, w ) / ∈ A ∧ ( δ, x ) / ∈ A then δ is in location 3.The location of node δ in the BFS tree of v † determines the value of C v † ( A ) as follows:• if δ is in location 1 then C v † ( A ) = 4 k | S | + 3 k + 5) ,• if δ is in location 2 then C v † ( A ) = 4 k | S | + 3 k + 5) ,• if δ is in location 3 then C v † ( A ) = 4 k | S | + 3 k + 5) . … … 𝑣𝑣 † 𝑄𝑄 𝑤𝑤𝑈𝑈 𝑆𝑆 𝑍𝑍 𝑌𝑌 𝐴𝐴 𝑥𝑥 𝛿𝛿𝛿𝛿 𝛿𝛿
Figure 10: BFS tree of node v † , with three possiblelocations of node δ . 𝑤𝑤 … 𝑣𝑣 † 𝑄𝑄𝑍𝑍 𝑆𝑆 𝑌𝑌 𝐴𝐴 𝑥𝑥 𝛿𝛿𝛿𝛿 𝛿𝛿 𝑈𝑈 Figure 11: BFS tree of node w , with three possiblelocations of node δ .Hence, we have that C v † ( A ) ≤ k | S | +3 k +5) . We will now show that for every node v ∈ V \{ v † , δ } andevery set of edges A that can be added to G , there exists a BFS tree of v such that C v ( A ) ≥ k | S | +3 k +5) .Since the Hiding Source by Adding Nodes problem requires the safety threshold to be maintained in everyrealization of the source detection algorithm, none of such v can contribute to the safety threshold.• C w ( A ) ≥ C v † ( A ) : Figure 11 presents a possible BFS tree of w with three potential locations of node δ . We can observe that the value of C w ( A ) is minimal when δ is connected with w (location 1 inFigure 11), which gives us: C w ( A ) ≥ k . Since we know that C v † ( A ) ≤ k | S | + 3 k + 5) , we have that: C w ( A ) C v † ( A ) ≥ k k | S | + 3 k + 5) = 2 k | S | + 6 k + 10 . Notice that we can assume that the set S contains at most one copy of each 3-element subset of U , asa solution to the given instance of the Exact 3-Set Cover problem never contains two instance of the25ame subset (in fact, since U has k elements, there is never any overlap between two elements of asolution). Hence, we can assume that | S | ≤ (3 k )!(3 k − = k (3 k − k − < k , which gives us: C w ( A ) C v † ( A ) > k k + 6 k + 10 . Now, notice that for k ≥ , the function k − k − k − is increasing (its derivative is k ln(2) − k − ) and its value for k = 17 is greater than zero. Hence, we have that k > k + 6 k + 10 , whichgives us: C w ( A ) C v † ( A ) > . • C x ( A ) ≥ C v † ( A ) : consider a BFS tree of x obtained by rooting the tree presented in Figure 11 in x instead of in w . We can observe that the value of C x ( A ) is minimal when δ is connected with x (location 2 in Figure 11), which gives us: C x ( A ) ≥ k (2 | S | + 7 k + 2) . Therefore, we have that: C x ( A ) C v † ( A ) ≥ k (2 | S | + 7 k + 2)4 k | S | + 3 k + 5) = 2 k (2 | S | + 7 k + 2)6(2 | S | + 3 k + 5) > . • C a i ( A ) ≥ C v † ( A ) : consider a BFS tree of a i obtained by rooting the tree presented in Figure 11 in a i instead of in w . We can observe that the value of C a i ( A ) is minimal when δ is connected with x (location 2 in Figure 11), which gives us: C a i ( A ) ≥ k (2 | S | + 7 k + 5)(2 | S | + 7 k + 2) . Therefore, we have that: C a i ( A ) C v † ( A ) ≥ k (2 | S | + 7 k + 5)(2 | S | + 7 k + 2)4 k | S | + 3 k + 5) = 2 k (2 | S | + 7 k + 5)(2 | S | + 7 k + 2)6(2 | S | + 3 k + 5) > . • C y i ( A ) ≥ C v † ( A ) : consider the BFS tree of y i obtained by rooting the tree presented in Figure 11 in y i instead of in w . We can observe that the value of C y i ( A ) is minimal when δ is connected with w (location 1 in Figure 11), which gives us: C y i ( A ) ≥ k | S | + 7 k + 5) . Therefore, we have that: C y i ( A ) C v † ( A ) ≥ k | S | + 7 k + 5)4 k | S | + 3 k + 5) = 2 k − (2 | S | + 7 k + 5)2 | S | + 3 k + 5 > . • C Q i ( A ) ≥ C v † ( A ) : Figure 12 presents a possible BFS tree of Q i with three potential locations of node δ . We can observe that the value of C Q i ( A ) is minimal when δ is connected with w (location 1 inFigure 12), which gives us: C Q i ( A ) ≥ k − | S | + 7 k + 1) . Therefore, given the assumption that k ≥ , we have that: C Q i ( A ) C v † ( A ) ≥ k − | S | + 7 k + 1)4 k | S | + 3 k + 5) = 2 k − (2 | S | + 7 k + 1)2 | S | + 3 k + 5 > . … 𝑄𝑄𝑍𝑍 𝑆𝑆 𝑌𝑌 𝐴𝐴 𝑥𝑥 𝛿𝛿𝛿𝛿 𝛿𝛿
𝑈𝑈𝑈𝑈 𝑄𝑄 𝑖𝑖 𝑣𝑣 † Figure 12: BFS tree of node Q i , with three possiblelocations of node δ . 𝑤𝑤 … 𝑣𝑣 † 𝑄𝑄𝑍𝑍 𝑆𝑆 𝑌𝑌 𝐴𝐴 𝑥𝑥 𝛿𝛿𝛿𝛿 𝛿𝛿
𝑈𝑈𝑈𝑈 𝑆𝑆 𝑖𝑖 𝛿𝛿 Figure 13: BFS tree of node S i , with four possiblelocations of node δ .• C S i ( A ) ≥ C v † ( A ) : Figure 13 presents a possible BFS tree of S i with four potential locations of node δ . We can observe that the value of C S i ( A ) is minimal when δ is connected with S i (location 4 inFigure 13), which gives us: C S i ( A ) ≥ k − | S | + 7 k + 1) . Therefore, given assumption that k ≥ , we have that: C S i ( A ) C v † ( A ) ≥ k − | S | + 7 k + 1)4 k | S | + 3 k + 5) = 2 k − (2 | S | + 7 k + 1)2 | S | + 3 k + 5 > . 𝑧𝑧 𝑖𝑖 𝑤𝑤 … 𝑄𝑄𝑈𝑈 𝑆𝑆 𝑌𝑌 𝐴𝐴 𝑥𝑥 𝛿𝛿𝛿𝛿 𝛿𝛿
𝑍𝑍 𝑣𝑣 † Figure 14: BFS tree of node z i , with three possiblelocations of node δ . 𝑢𝑢 𝑖𝑖 𝑤𝑤 … 𝑄𝑄𝑍𝑍 𝑆𝑆 𝑌𝑌 𝐴𝐴 𝑥𝑥 𝛿𝛿𝛿𝛿 𝛿𝛿
𝑈𝑈 𝑣𝑣 † 𝑄𝑄 𝑗𝑗 𝑆𝑆𝛿𝛿 Figure 15: BFS tree of node u i , with four possiblelocations of node δ .• C z i ( A ) ≥ C v † ( A ) : Figure 14 presents a possible BFS tree of z i with four potential locations of node δ . We can observe that the value of C z i ( A ) is minimal when δ is connected with w (location 1 inFigure 14), which gives us: C z i ( A ) ≥ k − | S | + k + 6) . Therefore, we have that: C z i ( A ) C v † ( A ) ≥ k − | S | + k + 6)4 k | S | + 3 k + 5) = 2 k − (2 | S | + k + 6)2 | S | + 3 k + 5 > . • C u i ( A ) ≥ C v † ( A ) : Figure 15 presents a possible BFS tree of u i with four potential locations of node δ . We can observe that the value of C u i ( A ) is minimal when δ is connected with w (location 1 inFigure 15), which gives us: C u i ( A ) ≥ k − | S | + k + 6)( | S | + k + 4) . C u i ( A ) C v † ( A ) ≥ k − | S | + k + 6)( | S | + k + 4)4 k | S | + 3 k + 5) = 2 k − ( | S | + k + 6)( | S | + k + 4)2 | S | + 3 k + 5 > . We showed that δ is the only node that can contribute to satisfying the safety threshold. Therefore,a given A is a solution to the constructed instance of the Hiding Source by Adding Nodes problem if andonly if we have that C v † ( A ) > C δ ( A ) . We will now show that C v † ( A ) > C δ ( A ) for a given A if and only if ( δ, w ) ∈ A , and ( δ, x ) ∈ A , and for every u i ∈ U there exists S j ∈ S such that u i ∈ S j and ( δ, S j ) ∈ A . … … 𝛿𝛿𝑆𝑆 𝑤𝑤𝑈𝑈 𝑆𝑆 𝑍𝑍 𝑌𝑌𝐴𝐴 𝑥𝑥 𝑣𝑣 † 𝑄𝑄 Figure 16: BFS tree of node δ , when it is connectedwith w , x , and a cover of U . 𝑆𝑆 𝑖𝑖 𝑆𝑆 𝑗𝑗 𝑆𝑆 𝑖𝑖 𝑆𝑆 𝑗𝑗 𝑆𝑆 𝑖𝑖 𝑆𝑆 𝑗𝑗 𝑆𝑆 𝑙𝑙 𝑆𝑆 𝑖𝑖 𝑆𝑆 𝑗𝑗 𝑆𝑆 𝑙𝑙 𝑆𝑆 𝑖𝑖 𝑆𝑆 𝑙𝑙 𝑆𝑆 𝑗𝑗 𝑆𝑆 𝑖𝑖 𝑆𝑆 𝑗𝑗 𝑆𝑆 𝑙𝑙 O p e r a t i o n I O p e r a t i o n II O p e r a t i o n III
Figure 17: Operations used in the proof of Lemma 1.First, we will show that if ( δ, w ) ∈ A and ( δ, x ) ∈ A and ∀ u i ∈ U ∃ S j ∈ S ( u i ∈ S j ∧ ( δ, S j ) ∈ A ) , then forevery BFS tree of δ we have C v † ( A ) > C δ ( A ) . Figure 16 presents the BFS tree of δ in this case (notice thatin this particular case, this is the only possible structure of the BFS tree). We have that: C δ ( A ) = 4 k | S | + 3 k + 2) . At the same time, we have that: C v † ( A ) = 4 k | S | + 3 k + 5) . Therefore, we have that C v † ( A ) > C δ ( A ) .Before we move on, we will prove a useful lemma. Lemma 1.
Assume that in a BFS tree of δ all nodes from U are leaves and are the only children of thenodes in S . The minimum over all possible values of (cid:81) S i ∈ S Θ δS i is k . The second lowest possible value is k − .Proof. In what follows, let Θ S denote (cid:81) S i ∈ S Θ δS i . To remind the reader, there are k nodes in U and everynode S i is connected with exactly of them. The value of Θ δS i is therefore between (if node S i has nochildren in the BFS tree) and (if all three nodes from U connected with S i are its children in the BFStree). Hence, we have that (cid:81) S i ∈ S Θ δS i = 4 a b c such that a + 2 b + c = 3 k .Assume that k nodes in S i have three children each. The value of (cid:81) S i ∈ S Θ δS i is then k . Notice that anypossible value a b c of (cid:81) S i ∈ S Θ δS i can be achieved starting from k by performing the following operations(presented in Figure 17) in any order:• operation I: moving one child of a node in S with three children to another node in S with no children,repeated min( b, c ) times,• operation II: moving two children of a node in S with three children to two other nodes in S withno children, repeated c − b times if c > b , and not executed at all otherwise,28 operation III: moving one child each of two nodes in S with three children to another node with nochildren, repeated b − c times if b > c , and not executed at all otherwise.Let Θ be the value of (cid:81) S i ∈ S Θ δS i before performing a given operation. Notice that each of the operationsincreases the value of (cid:81) S i ∈ S Θ δS i :• the value of (cid:81) S i ∈ S Θ δS i after operation I is Θ ,• the value of (cid:81) S i ∈ S Θ δS i after operation II is ,• the value of (cid:81) S i ∈ S Θ δS i after operation III is Θ .Hence, since every possible value of a b c can be achieved via performing a sequence of operationsstarting with k , and every operation increases the value, then k is the minimal possible value. Moreover,since operation I increases the value the least, the second minimal value is k = 4 k − . … … 𝛿𝛿𝑆𝑆 𝑤𝑤𝑈𝑈 𝑆𝑆 𝑍𝑍 𝑌𝑌𝐴𝐴 𝑥𝑥 𝑣𝑣 † 𝑄𝑄 𝑈𝑈
Figure 18: BFS tree of node δ for Case I. … … 𝛿𝛿𝑆𝑆 𝑤𝑤𝑈𝑈 𝑆𝑆 𝑍𝑍 𝑌𝑌𝐴𝐴𝑥𝑥 𝑣𝑣 † 𝑄𝑄 𝑈𝑈
Figure 19: BFS tree of node δ for Case II.Now, we will show that if ( δ, w ) / ∈ A or ( δ, x ) / ∈ A or ∃ u i ∈ U ∀ S j ∈ S : u i ∈ S j ( δ, S j ) / ∈ A , then there exists aBFS tree of δ such that C δ ( A ) ≥ C v † ( A ) . We show the proof for each of the cases separately:• Case I ( δ, w ) / ∈ A ∧ ( δ, x ) ∈ A : Figure 18 presents the structure of the BFS tree of δ for this case.Notice that, given Lemma 1, the value of C δ ( A ) is minimal when k nodes from S connected with δ cover the entire universe (we then have (cid:81) S i ∈ S Θ δS i = 4 k ). Notice that, if δ would be connected withmore than k nodes from S (which is possible, since the budget of the evader is k + 2 ), then at leastone element of U would be a neighbor of two nodes from S connected with δ , and, based on Lemma 1,we would be able to choose a BFS tree such that (cid:81) S i ∈ S Θ δS i ≥ k − . We have: C δ ( A ) ≥ k (2 | S | + 3 k + 2)(2 | S | + 3 k + 5) , as well as: C v † ( A ) ≤ k | S | + 3 k + 5) , which gives us: C δ ( A ) C v † ( A ) ≥ k (2 | S | + 3 k + 2)(2 | S | + 3 k + 5)4 k | S | + 3 k + 5) = (2 | S | + 3 k + 2)(2 | S | + 3 k + 5)6(2 | S | + 3 k + 5) > . • Case II ( δ, w ) / ∈ A ∧ ( δ, x ) / ∈ A : Figure 19 presents the structure of the BFS tree of δ for this case.Notice that, given Lemma 1, the value of C δ ( A ) is minimal when k nodes from S connected with δ cover the entire universe (we then have (cid:81) S i ∈ S Θ δS i = 4 k ). Notice that, if δ would be connected with29ore than k nodes from S (which is possible, since the budget of the evader is k + 2 ), then at leastone element of U would be a neighbor of two nodes from S connected with δ , and, based on Lemma 1,we would be able to choose a BFS tree such that (cid:81) S i ∈ S Θ δS i ≥ k − . We have: C δ ( A ) ≥ k − | S | + 3 k + 5)(2 | S | + 3 k + 8) , as well as: C v † ( A ) = 4 k | S | + 3 k + 5) , which gives us: C δ ( A ) C v † ( A ) ≥ k − | S | + 3 k + 5)(2 | S | + 3 k + 8)4 k | S | + 3 k + 5) = (2 | S | + 3 k + 5)(2 | S | + 3 k + 8)8(2 | S | + 3 k + 5) > . … …𝛿𝛿𝑆𝑆 𝑤𝑤𝑈𝑈 𝑆𝑆 𝑍𝑍 𝑌𝑌𝐴𝐴𝑥𝑥 𝑣𝑣 † 𝑄𝑄 𝑈𝑈
Figure 20: BFS tree of node δ for Case III. … … 𝛿𝛿𝑆𝑆 𝑤𝑤𝑈𝑈 𝑆𝑆 𝑍𝑍 𝑌𝑌𝐴𝐴 𝑥𝑥 𝑣𝑣 † 𝑄𝑄 𝑈𝑈
Figure 21: BFS tree of node δ for Case IV.• Case III ( δ, w ) ∈ A ∧ ( δ, x ) / ∈ A : Figure 20 presents the structure of the BFS tree of δ for this case.Notice that, given Lemma 1, the value of C δ ( A ) is minimal when k nodes from S connected with δ cover the entire universe (we then have (cid:81) S i ∈ S Θ δS i = 4 k ). Notice that, if δ would be connected withmore than k nodes from S (which is possible, since the budget of the evader is k + 2 ), then at leastone element of U would be a neighbor of two nodes from S connected with δ , and, based on Lemma 1,we would be able to choose a BFS tree such that (cid:81) S i ∈ S Θ δS i ≥ k − . We have: C δ ( A ) ≥ k | S | + 3 k + 5) , as well as: C v † ( A ) = 4 k | S | + 3 k + 5) , which gives us: C δ ( A ) C v † ( A ) ≥ k | S | + 3 k + 5)4 k | S | + 3 k + 5) = 1 . • Case IV ( δ, w ) ∈ A ∧ ( δ, x ) ∈ A ∧ ∃ u i ∈ U ∀ S j ∈ S : u i ∈ S j ( δ, S j ) / ∈ A : Figure 21 presents the structure of theBFS tree of δ for this case. Since nodes in S connected with δ do not cover the entire universe, thereis at least one node from U connected z in the BFS tree. We have: C δ ( A ) ≥ k − | S | + 3 k + 3) , as well as: C v † ( A ) = 4 k | S | + 3 k + 5) , which gives us: C δ ( A ) C v † ( A ) ≥ k − | S | + 3 k + 3)4 k | S | + 3 k + 5) = 3(2 | S | + 3 k + 3)2(2 | S | + 3 k + 5) > .
30e showed that a given A is a solution to the constructed instance of the Hiding Source by Adding Nodesif and only if ( δ, w ) ∈ A , and ( δ, x ) ∈ A , and for every u i ∈ U there exists S j ∈ S such that u i ∈ S j and ( δ, S j ) ∈ A . Finally, we are ready to prove the theorem.Assume that there exists a solution to the given instance of the Exact 3-Set Cover problem, i.e., a subset S ∗ ⊆ S such that (cid:83) S i ∈ S ∗ S i = U . Then the set { δ } × ( { w, x } ∪ S ∗ ) is a solution to the constructed instanceof the Hiding Source by Adding Nodes problem.Similarly, assume that there exists a solution A ∗ to the constructed instance of the Hiding Source byAdding Nodes problem. Hence, for every u i ∈ U there exists S j ∈ S such that u i ∈ S j and ( δ, S j ) ∈ A ∗ . Wealso have that δ can be connected with at most k nodes from S , as it has to also be connected with w and x , and the budget of the evader is k + 2 . Therefore, the set { S i ∈ S : ( δ, S i ) ∈ A ∗ } is a solution to the giveninstance of the Exact 3-Set Cover problem.This concludes the proof. Theorem 5.
The problem of Hiding source by Adding Nodes is NP-complete given the Random Walk sourcedetection algorithm.Proof.
The problem is trivially in NP, since after adding a given set of edges A ∗ , it is possible to computethe Random Walk source detection algorithm scores of all nodes in G I in polynomial time.We will now prove that the problem is NP-hard. To this end, we will show a reduction from theNP-complete -Set Cover problem. The decision version of this problem is defined by a universe, U = { u , . . . , u | U | } , and a collection of sets S = { S , . . . , S | S | } such that ∀ i S i ⊂ U and ∀ i | S i | = 3 , where the goalis to determine whether there exist k elements of S the union of which equals U .Let ( U, S ) be a given instance of the -Set Cover problem. We will now construct an instance of theHiding Source by Adding Nodes problem.First, let us construct a network G = ( V, E ) where:• V = { v † , w, x, a } ∪ S ∪ U ,• E = { ( v † , w ) , ( v † , a ) , ( w, x ) } ∪ (cid:0) { v † } × S (cid:1) ∪ (cid:83) S i ∈ V (cid:83) u j ∈ S i { ( S i , u j ) } .In what follows we will denote the set of nodes S , . . . , § | S | by S , and we will denote the set of nodes u , . . . , u | U | by U . Notice that having u j ∈ S i in the last union in the formula of E means that we connect agiven node S i ∈ S only with nodes u j ∈ U corresponding to the elements contained in S i in the given instanceof the -Set Cover problem. An example of the construction of the network G is presented in Figure 22. 𝒗𝒗 † 𝜹𝜹 𝐺𝐺 𝑺𝑺 𝑺𝑺 𝑺𝑺 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝑆𝑆 𝑆𝑆 𝑆𝑆 𝒙𝒙 𝒘𝒘 𝒂𝒂 Figure 22: Construction of the network used in the proof of Theorem 5. Green dotted edges are allowed tobe added. Infected nodes in G are highlighted in black.Now, consider the instance ( G, v † , I, σ, ω, b, ∇ , (cid:98) S ) of the Hiding Source by Adding Nodes problem, where:• G is the network we just constructed, 31 v † is the evader,• I = V \ { a } , i.e., all nodes in G other than a are infected,• σ is the Random Walk source detection algorithm,• ω = | S | + | U | + 3 is the safety threshold,• b = k + 1 ,• ∇ = { δ } ,• (cid:98) S = { x } ∪ S .Moreover, the Random Walk source detection algorithm needs to be parameterized with the model ofspreading. Let the algorithm be parameterized with the model in which the probability of propagation is p = and the time of diffusion is T = 3 .Notice that since the safety threshold is ω = | S | + | U | + 3 , all nodes other than a (which is not infected)must have greater source detection algorithm scores than v † . In particular, after adding any non-empty A to the network, the score of v † is greater than zero (since all infected nodes are then within distance T = 3 from v † ). Therefore, if A is a solution to the constructed instance of the Hiding Source by Adding Nodesproblem, then all nodes in U have to be within distance T = 3 from the node x (otherwise their scores areall zero). This is the case if and only if δ is connected with x and for every node u i there exists a node S j such that u i ∈ S j and S j is connected with δ .We will now show the implication in the other direction, i.e., if for a given A we have that δ is connectedwith x and for every node u i there exists a node S j such that u i ∈ S j and S j is connected with δ , then A isa solution to the constructed instance of the Hiding Source by Adding Nodes problem. v φ ( v ) φ ( v ) φ ( v ) φ ( v ) = σ rwalk ( v, G ∪ A, I ∪ { δ } ) v † − p | S | +2 − p (2 − p ) | S | +2 − p (80+40 | S | +51 p +25 p | S |− p − p | S |−| A | p )20( | S | +2) δ − | A |− | A | p | S | +2) w − p | S | +2) − p (3 − p )2( | S | +2) x − p | S | +2) u i ≥ − p | S | +2) S i ∈ N ( δ ) 1 1 1 − p | S | +2) − p (3 − p )5( | S | +2) S i / ∈ N ( δ ) 1 1 1 − p | S | +2) − p (3 − p )4( | S | +2) Table 4: The values of φ t after the addition of a given A such that ( δ, x ) ∈ A , used in the proof of Theorem 5.Table 4 presents the values of φ t for all infected nodes in the network and for different values of t , afterwe add to the network a given A such that ( δ, x ) ∈ A . We will now show that for every node v other v † wehave φ ( v ) > φ ( v † ) . Given that p = , we have:• φ ( v † ) = 1 − | S |−| A | | S | +2) < − | S | | S | +2) = 1 − | S | +2) ;• φ ( δ ) = 1 − | A |− | A | | S | +2) > − | S | +2) > φ ( v † ) ;• φ ( w ) = 1 − | S | +2) > φ ( v † ) ;• φ ( x ) = 1 − | S | +2) > φ ( v † ) ;• φ ( u i ) ≥ − | S | +2) > φ ( v † ) ; 32 if S i ∈ N ( δ ) then φ ( S i ) = 1 − | S | +2) > φ ( v † ) ;• if S i / ∈ N ( δ ) then φ ( S i ) = 1 − | S | +2) > φ ( v † ) .We showed that a given A is a solution to the constructed instance of the Hiding Source by Adding Nodesproblem if and only if δ is connected with x and for every node u i there exists a node S j such that u i ∈ S j and S j is connected with δ . We will now show that the constructed instance of the Hiding Source by AddingNodes problem has a solution if and only if the given instance of the -Set Cover problem has a solution.Assume that there exists a solution to the given instance of the -Set Cover problem, i.e., a subset S ∗ ⊆ S of size k the union of which is the universe U . In that case the set { δ } × ( { x } ∪ S ∗ ) is a solution to theconstructed instance of the Hiding Source by Adding Nodes problem.Assume that there exists a solution A ∗ to the constructed instance of the Hiding Source by Adding Nodesproblem. In that case S ∗ = { S i ∈ S : ( δ, S i ) ∈ A ∗ } is a a solution to the given instance of the -Set Coverproblem.This concludes the proof. Theorem 6.
The problem of Hiding Source by Adding Nodes is NP-complete given the Monte Carlo sourcedetection algorithm.Proof.
The problem is trivially in NP, since after adding a given set of edges A ∗ , it is possible to generateMonte Carlo samples and compute the ranking of all nodes in G I in polynomial time.We will now prove that the problem is NP-hard. To this end, we will show a reduction from the NP-complete Dominating Set problem. The decision version of this problem is defined by a network, H =( V (cid:48) , E (cid:48) ) , where V (cid:48) = { v , . . . , v n } , and a constant k ∈ N , where the goal is to determine whether there exist V ∗ ⊆ V (cid:48) such that | V ∗ | = k and every node outside V ∗ has at least one neighbor in V ∗ , i.e., ∀ v ∈ V (cid:48) \ V ∗ N H ( v ) ∩ V ∗ (cid:54) = ∅ .Let ( H, k ) be a given instance of the Dominating Set problem. Let us assume that there is no solution ofsize one, i.e., ∀ v i ∈ V (cid:48) d H ( v i ) < n − . This can be easily checked in polynomial time. We will now constructan instance of the Hiding Source by Adding Nodes problem.First, let us construct a network G = ( V, E ) where:• V = V (cid:48) ∪ { v † , u, w, x } ∪ (cid:83) n − i =1 { a i } ,• E = E (cid:48) ∪ { ( v † , u ) , ( u, w ) , ( w, x ) } ∪ (cid:83) n − i =1 { ( a i , x ) } .An example of the construction of the network G is presented in Figure 23. 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝜹𝜹 𝐻𝐻 𝐺𝐺 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒘𝒘𝒙𝒙𝒂𝒂 𝒂𝒂 𝒖𝒖 𝒂𝒂 𝒗𝒗 † Figure 23: Construction of the network used in the proof of Theorem 6. Green dotted edges are allowed tobe added. Infected nodes in G are highlighted in black.Now, consider the instance ( G, v † , I, σ, ω, b, ∇ , (cid:98) S ) of the Hiding Source by Adding Nodes problem, where:• G is the network we just constructed, 33 v † is the evader,• I = { v † , u, w, x } ,• σ is the Monte Carlo source detection algorithm,• ω = 2 is the safety threshold,• b = k + 1 , where k is the size of the dominating set from the Dominating Set problem instance,• ∇ = { δ } ,• (cid:98) S = V (cid:48) ∪ { v † } .Moreover, the Monte Carlo source detection algorithm needs to be parameterized with the model ofspreading. Let the algorithm be parameterized with the model in which the probability of propagation is p = 1 and the time of diffusion is T = 3 . Notice that since the model with p = 1 is deterministic, all MonteCarlo samples starting from a given node always give the same result, hence the formula of the Monte Carlosource detection algorithm is in this case: σ mcarlo ( v, G, I ) = exp − (cid:16) | I ∩ I v || I ∪ I v | − (cid:17) a where I v is the set of infected nodes in the Monte Carlo sample starting the diffusion at v . Moreover, noticethat a position of a given node v ∈ I in the ranking generated by the Monte Carlo source detection algorithmdepends solely on the value of C v = | I ∩ I v || I ∪ I v | (in particular, the node with the greatest value of C v is selectedas the source of diffusion).Notice that the problem of Hiding Source by Adding Nodes requires the network induced by the infectednodes to be connected, hence δ must be connected with v † , and it may be connected with at most k nodesfrom V (cid:48) . Let V (cid:48) A,y denote the set of nodes from V (cid:48) in distance at most y from δ after the addition of A , i.e., V (cid:48) A,y = { v ∈ V (cid:48) : d ( V ∪∇ ,E ∪ A ) ( v † , δ ) ≤ y } . Notice that for any y we have | V (cid:48) A,y | ≤ n . Let us compute thevalues of C v for the nodes in I (notice that these are the only nodes that can be selected as the source ofdiffusion by the Monte Carlo algorithm) after the addition of an arbitrary A :• C v † = | V (cid:48) A, | +5 ,• C δ = | V (cid:48) A, | +5 ≤ | V (cid:48) A, | +5 < | V (cid:48) A, | +5 = C v † ,• C u = n + | V (cid:48) A, | +4 ,• C w = n +4 ,• C x = n +4 < n +5 ≤ | V (cid:48) A, | +5 = C v † (as | V (cid:48) A, | ≤ n ).Since the safety threshold is ω = 2 , both u and w need to have a greater value of C v than v † . It is easyto see that C w > C v † if and only if | V (cid:48) A, | = n . Notice also that if | V (cid:48) A, | = n then also | V (cid:48) A, | > (and,consequently, C u = n + | V (cid:48) A, | +4 < n +5 = C v † ), as we assumed that there is no node with degree n − in H (and subnetwork induced by V (cid:48) in G is the same as H ). Therefore, the safety threshold is met if and onlyif | V (cid:48) A, | = n .We will now show that the constructed instance of the Hiding Source by Modifying Edges problem hasa solution if and only if the given instance of the Dominating Set problem has a solution.Assume that there exists a solution to the given instance of the Dominating Set problem, i.e., a subset V ∗ ⊆ V (cid:48) of size k such that all other nodes have a neighbor in V ∗ . After adding to G set A ∗ = { ( δ, v † ) } ∪ δ } × V ∗ we have that | A ∗ | = k + 1 and | V (cid:48) A ∗ , | = n , which implies that the safety threshold is met. Weshowed that if there exists a solution to the given instance of the Dominating Set problem, then there alsoexists a solution to the constructed instance of the Hiding Source by Adding Nodes problem.Assume that there exists a solution A ∗ to the constructed instance of the Hiding Source by ModifyingEdges problem. As shown above, we must have | V (cid:48) A ∗ , | = n , i.e., every node in V (cid:48) is either connected with δ or has a neighbor who is connected with δ . Moreover, { ( δ, v † ) } must be a part of A ∗ , and, since thebudget of the evader is b = k + 1 , there are at most k edges connecting δ with the nodes in V (cid:48) . Therefore V ∗ = { v ∈ V (cid:48) : ( δ, v ) ∈ A ∗ } is a dominating set in H of size at most k (we can add k − | A ∗ | arbitrarily chosenelements to obtain set of size exactly k ). We showed that if there exists a solution to the constructed instanceof the Hiding Source by Adding Nodes problem, then there also exists a solution to the given instance of theDominating Set problem.This concludes the proof. Theorem 7.
The problem of Hiding Source by Modifying Edges is NP-complete given the Degree sourcedetection algorithm.Proof.
The problem is trivially in NP, since after adding and removing the given sets of edges A ∗ and R ∗ , itis possible to compute the degree centrality ranking of all nodes in G I in polynomial time.We will now prove that the problem is NP-hard. To this end, we will show a reduction from theNP-complete Finding k -Clique problem. The decision version of this problem is defined by a network, H = ( V (cid:48) , E (cid:48) ) , where V (cid:48) = { v , . . . , v n } , and a constant k ∈ N , where the goal is to determine whether thereexist k nodes forming a clique in H .Let ( H, k ) be a given instance of the Finding k -Clique problem. Let us assume that n = 2 , i.e., network H has at least two nodes. We will now construct an instance of the Hiding Source by Modifying Edgesproblem.First, let us construct a network G = ( V, E ) where:• V = V (cid:48) ∪ { v † } ∪ (cid:83) ni =1 (cid:83) n − k +1 j =1 { a i,j } ,• E = (cid:83) v i ∈ V { ( v † , v i ) } ∪ (cid:83) v i ∈ V (cid:83) a i,j ∈ V { ( v i , a i,j ) } .An example of the construction of the network G is presented in Figure 24. 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 † 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝐻𝐻 𝐺𝐺
Figure 24: Construction of the network used in the proof of Theorem 7. Green dotted edges are allowed tobe added.Now, consider the instance ( G, v † , I, σ, ω, b, (cid:98) A, (cid:98) R ) of the Hiding Source by Modifying Edges problem,where:• G is the network we just constructed, 35 v † is the evader,• I = V , i.e., all nodes in G are infected,• σ is the Degree source detection algorithm,• ω = k is the safety threshold,• b = k ( k − ,• (cid:98) A = E (cid:48) , i.e., only edges existing in H can be added to G ,• (cid:98) R = ∅ , i.e., none of the edges can be removed.Since (cid:98) R = ∅ , for any solution to the constructed instance of the Hiding Source by Modifying Edgesproblem we must have R ∗ = ∅ . Hence, we will omit mentions of R ∗ in the remainder of the proof, and wewill assume that a solution consists just of A ∗ .Notice that the degree of the evader v † in G is n , and it does not change after the addition of any A ⊆ (cid:98) A (as we can only add edges between the members of V (cid:48) ). Notice also that the degree of every node a i,j is ,and it cannot be increased. Therefore, the only nodes that can contribute to satisfying the safety threshold(by increasing their degree to a value greater than the degree of the evader) are the nodes in V (cid:48) . The degreeof any v i in G is n − k + 2 (as it is connected with n − k + 1 nodes a i,j and with the node v † ). Therefore, inorder for a given v i to have greater degree than v † , we have to add to G at least k − edges incident with v i .We will now show that the constructed instance of the Hiding Source by Modifying Edges problem hasa solution if and only if the given instance of the Finding k -Clique problem has a solution.Assume that there exists a solution to the given instance of the Finding k -Clique problem, i.e., a subset V ∗ ⊆ V (cid:48) forming a k -clique in H . We will show that A ∗ = V ∗ × V ∗ is a solution to the constructed instanceof the Hiding Source by Modifying Edges. First, notice that indeed A ∗ ⊆ (cid:98) A , as (cid:98) A contains all edges from H ,and V ∗ × V ∗ is a clique in H . Notice also that adding A ∗ to G increases the degree of k nodes in V ∗ to n + 1 ,i.e., to a value greater than the degree of the evader. Hence, there now exist k nodes with degree greaterthan the evader. We showed that if there exists a solution to the given instance of the Finding k -Cliqueproblem, then there also exists a solution to the constructed instance of the Hiding Source by ModifyingEdges problem.Assume that there exists a solution A ∗ to the constructed instance of the Hiding Source by ModifyingEdges problem. We will show that V ∗ = (cid:83) ( v,w ) ∈ A ∗ { v, w } forms a k -clique in H . Notice that since A ∗ is asolution, it increases the degree of at least k nodes in V (cid:48) (since the safety threshold is ω = k ) by at least k − . However, since the budget is b = k ( k − , adding A ∗ must increase the degree of exactly k nodes in V (cid:48) by exactly k − . If such a choice is available, the nodes in V ∗ form a clique in (cid:98) A , therefore, they also forma clique in H . We showed that if there exists a solution to the constructed instance of the Hiding Sourceby Modifying Edges problem, then there also exists a solution to the given instance of the Finding k -Cliqueproblem.This concludes the proof. Theorem 8.
The problem of Hiding source by Modifying Edges is NP-complete given the Closeness sourcedetection algorithm.Proof.
The problem is trivially in NP, since after adding and removing the given sets of edges A ∗ and R ∗ , itis possible to compute the closeness centrality ranking of all nodes in G I in polynomial time.We will now prove that the problem is NP-hard. To this end, we will show a reduction from theNP-complete -Set Cover problem. The decision version of this problem is defined by a universe, U = { u , . . . , u | U | } , and a collection of sets S = { S , . . . , S | S | } such that ∀ i S i ⊂ U and ∀ i | S i | = 3 , where the goalis to determine whether there exist k elements of S the union of which equals U .Let ( U, S ) be a given instance of the -Set Cover problem. Let us assume that | S | + | U | ≥ , note thatall instances where | S | + | U | < can be solved in polynomial time. We will now construct an instance of theHiding Source by Modifying Edges problem. 36irst, let us construct a network G = ( V, E ) where:• V = { v † , w } ∪ S ∪ U ∪ (cid:83) | S |− k +1 i =1 { a i } ,• E = { ( v † , w ) } ∪ (cid:83) a i ∈ V { ( w, a i ) } ∪ (cid:83) S i ∈ V { ( v † , S i ) } ∪ (cid:83) S i ∈ V (cid:83) u j ∈ S i { ( S i , u j ) } .In what follows we will denote the set of nodes S , . . . , § | S | by S , and we will denote the set of nodes u , . . . , u | U | by U . Notice that u j ∈ S i in the last union in the formula of E means that we connect a givennode S i ∈ S only with nodes u j ∈ U corresponding to the elements contained in S i in the given instance ofthe -Set Cover problem. An example of the construction of the network G is presented in Figure 25. 𝒗𝒗 † 𝒘𝒘 𝐺𝐺 𝑺𝑺 𝑺𝑺 𝑺𝑺 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝒖𝒖 𝑆𝑆 𝑆𝑆 𝑆𝑆 𝒂𝒂 𝒂𝒂 Figure 25: Construction of the network used in the proof of Theorem 8. Green dotted edges are allowedto be added. Colors of the nodes u i express correspondence with the elements of the universe in the -SetCover problem instance.Now, consider the instance ( G, v † , I, σ, ω, b, (cid:98) A, (cid:98) R ) of the Hiding Source by Modifying Edges problem,where:• G is the network we just constructed,• v † is the evader,• I = V , i.e., all nodes in G are infected,• σ is the Closeness source detection algorithm,• ω = 1 is the safety threshold,• b = k , where k is the size of solution of the given instance of the -Set Cover problem,• (cid:98) A = { w } × S , i.e., we allow to add any edges between w and nodes in S ,• (cid:98) R = ∅ , i.e., none of the edges can be removed.Since (cid:98) R = ∅ , for any solution to the constructed instance of the Hiding Source by Modifying Edgesproblem we must have R ∗ = ∅ . Hence, we will omit mentions of R ∗ in the remainder of the proof, and wewill assume that a solution consists just of A ∗ .First, let us analyze the closeness centrality values of the nodes in G after the addition of any A ⊆ (cid:98) A . Let x A denote the number of nodes u j ∈ U such that d ( w, u j ) = 2 , i.e., the number of nodes u j ∈ U such that w is connected (via the edges from A ) with at least one node S i connected with u j (notice that there are noother possibilities for w to be in distance from a node in U ). Moreover, let D v denote the sum of distancesfrom v to all nodes in the network, i.e., D v = (cid:80) w ∈ V d ( v, w ) . Notice that σ clos ( v, G, I ) = (cid:80) w ∈ I D v , i.e.,greater value of D v implies lower position of the ranking of nodes according to Closeness source detectionalgorithm. Table 5 presents the computation of D v for every node v ∈ V after the addition of a given A ⊆ (cid:98) A .37 d ( v, v † ) d ( v, w ) (cid:80) j d ( v, a j ) (cid:80) j d ( v, S j ) (cid:80) j d ( v, u j ) D v v † | S | − k + 1) | S | | U | | S | + 2 | U | − k + 3 w | S | − k + 1 2 | S | − | A | | U | − x A | S | + 3 | U | − k − | A | − x A + 2 a i | S | − k ) 3 | S | − | A | | U | − x A | S | + 4 | U | − k − | A | − x A + 3 S i ≥ ≥ | S | − k + 1) 2( | S | −
1) 3 + 3( | U | − ≥ | S | + 3 | U | − k − u i ≥ ≥ | S | − k + 1) ≥ | S | ≥ | U | − ≥ | S | + 2 | U | − k + 5 Table 5: Sums of distances between nodes of the network after the addition of A ⊆ (cid:98) A , used in the proof ofTheorem 8.Since the safety threshold is ω = 1 , only one node have to have greater closeness centrality than v † after he addition of a given A ⊆ (cid:98) A in order for said A to be solution to the constructed instance of theHiding Source by Modifying Edges problem. However, notice that after adding any A to the network wehave D v † < D a i , D v † < D u i , and D v † < D S i (the last inequality holds when we use the assumption that | S | + | U | ≥ ). Hence, the only node that can have greater closeness centrality than v † (and therefore higherposition in the ranking used to determine the source of diffusion) is w . In other words, a given A ⊆ (cid:98) A isa solution to the constructed instance of the Hiding Source by Modifying Edges problem if and only if wehave D v † > D w after the addition of A .We will now show that the constructed instance of the Hiding Source by Modifying Edges problem hasa solution if and only if the given instance of the -Set Cover problem has a solution.Assume that there exists a solution to the given instance of the -Set Cover problem, i.e., a subset S ∗ ⊆ S of size k the union of which is the universe U . We will show that A ∗ = { w } × S ∗ (i.e., connecting w withnodes corresponding to all sets in S ∗ ) is a solution to the constructed instance of the Hiding Source byModifying Edges problem. First, notice that after the addition of A ∗ there exists a path of length two from w to every node u j ∈ U , leading through the node corresponding to an element S i ∈ S ∗ containing u j , withwhich w is now connected. Hence, we have that x A ∗ = | U | and | A ∗ | = k , which gives us: D w = 3 | S | + 3 | U | − k − | A ∗ | − x A ∗ + 2 = 3 | S | + 2 | U | − k + 2 < | S | + 2 | U | − k + 3 = D v † . Therefore, after the addition of A ∗ node w has greater closeness centrality than the evader. We showed thatif there exists a solution to the given instance of the -Set Cover problem, then there also exists a solutionto the constructed instance of the Hiding Source by Modifying Edges problem.Assume that there exists a solution A ∗ to the constructed instance of the Hiding Source by ModifyingEdges problem. We will show that S ∗ = { S i ∈ S : ( w, S i ) ∈ A ∗ } is a cover of U . Let us compute thedifference between D v † and D w after the addition of A ∗ (based on the values from Table 5): D v † − D w = | A ∗ | + x A ∗ + 1 − k − | U | . Since A ∗ is a solution, w must have greater centrality measure than v † , implying D v † − D w > , which givesus: | A ∗ | + x A ∗ + 1 > k + | U | . Notice however that | A ∗ | ≤ k (since the evader’s budget is b = k ) and that x A ∗ ≤ | U | (since x A ∗ is thenumber of nodes in U at distance from w ). Therefore, we must have | A ∗ | = k and x A ∗ = | U | , which meansthat after the addition of A ∗ for every u j ∈ U we have a node S i ∈ S connected to both w and u j (there isno other way of forming a path of length between w and u j ). Consequently, every element of the universe u j ∈ U is covered by at least one set S i ∈ S ∗ , as w is connected with a node S i only if it belongs to S ∗ , and S i is connected with u j only if it contains u j in the -Set Cover problem instance. We showed that if thereexists a solution to the constructed instance of the Hiding Source by Modifying Edges problem, then therealso exists a solution to the given instance of the -Set Cover problem.This concludes the proof. 38 heorem 9. The problem of Hiding source by Modifying Edges is NP-complete given the Betweenness sourcedetection algorithm.Proof.
The problem is trivially in NP, since after adding and removing the given sets of edges A ∗ and R ∗ , itis possible to compute the betweenness centrality ranking of all nodes in G I in polynomial time.We will now prove that the problem is NP-hard. To this end, we will show a reduction from theNP-complete Finding k -Clique problem. The decision version of this problem is defined by a network, H = ( V (cid:48) , E (cid:48) ) , where V (cid:48) = { v , . . . , v n } , and a constant k ∈ N , where the goal is to determine whether thereexist k nodes forming a clique in H .Let ( H, k ) be a given instance of the Finding k -Clique problem. Let us assume that n ≥ , i.e., network H has at least three nodes. We will now construct an instance of the Hiding Source by Modifying Edgesproblem.First, let us construct a network G = ( V, E ) where:• V = V (cid:48) ∪ { v † , w , w } ∪ (cid:83) ni =1 { a i } ∪ (cid:83) n − i =1 { b i } ,• E = E (cid:48) ∪ { ( v , w ) , ( v , w ) } ∪ (cid:83) v i ∈ V { ( v i , v † ) , ( v i , a i ) } ∪ (cid:83) b i ∈ V { ( b i , w ) , ( b i , w ) } .In what follows we will denote the set of nodes a , . . . , a n by A , we will denote the set of nodes b , . . . , b n − by B , and we will denote the set of nodes w , w by W . Notice that two nodes v i , v j ∈ V (cid:48) are connected in G if and only if the are connected in H (as E (cid:48) is included in E , and no other edges are added between themembers of V (cid:48) ). An example of the construction of the network G is presented in Figure 26. 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 † 𝐻𝐻 𝐺𝐺 𝒘𝒘 𝒘𝒘 𝒃𝒃 𝒃𝒃 𝒂𝒂 𝒂𝒂 𝒂𝒂 𝒂𝒂 Figure 26: Construction of the network used in the proof of Theorem 9. Red dashed edges are allowed tobe removed.Now, consider the instance ( G, v † , I, σ, ω, b, (cid:98) A, (cid:98) R ) of the Hiding Source by Modifying Edges problem,where:• G is the network we just constructed,• v † is the evader,• I = V , i.e., all nodes in G are infected,• σ is the Betweenness source detection algorithm,• ω = 2 n is the safety threshold,• b = n − k , where k is the size of clique in the given instance of Finding k -Clique problem,• (cid:98) A = ∅ , i.e., none of the edges can be removed, 39 (cid:98) R = { v † } × V (cid:48) , i.e., only edges between v † and nodes in V (cid:48) can be removed.Since (cid:98) A = ∅ , for any solution to the constructed instance of the Hiding Source by Modifying Edgesproblem we must have A ∗ = ∅ . Hence, we will omit mentions of A ∗ in the remainder of the proof, and wewill assume that a solution consists just of R ∗ .Let C v denote the value of σ betw ( v, G, I ) , i.e., the value used to determine the source of diffusion. Toremind the reader: C v = (cid:88) u (cid:54) = w : u,w ∈ I \{ v } |{ π ∈ Π( u, w ) : v ∈ π }|| Π( u, w ) | where Π( u, w ) is the set of shortest paths between the nodes u and w .We will now make the following observations about the values of C v in G after removal of an arbitrary R ⊆ (cid:98) R :• C v i ≥ n − for any v i ∈ V (cid:48) , as it controls all shortest paths between a i an all other n − nodes,• C w i = ( n − n +1)2 for any w i ∈ W , as it controls half of all the shortest paths between n − nodes in B and all other n + 1 nodes in { v † } ∪ V (cid:48) ∪ A (the other half is controlled by the other node in W )and it does not control any other shortest paths,• C a i = 0 for any a i ∈ A , as it does not control any shortest paths,• C b i = n − for any b i ∈ B , as it controls one of n − shortest paths between w and w (the otherpaths are controlled by other nodes in B and by the node v ) and it does not control any other shortestpaths.Notice that if after the removal of a given R ⊆ (cid:98) R the evader v † is connected with at least two nodes in v i , v j ∈ V (cid:48) that do not have an edge between them, then we have C v † ≥ n − , as v † controls one shortestpaths between v i and v j , where other paths can only be controlled by the other n − nodes in V (cid:48) . The onlynodes with the values of C v greater than n − are the n + 2 nodes in V (cid:48) ∪ W . However, the safety thresholdis ω = 2 n , so we also need n − nodes in B to have greater value of C v than v † (notice that since C a i = 0 for all a i ∈ A , they will never contribute to the safety threshold). Therefore, the safety margin requirementis fulfilled only if all neighbors of v † form a clique (in which case C v † = 0 ).We will now show that the constructed instance of the Hiding Source by Modifying Edges problem hasa solution if and only if the given instance of the Finding k -Clique problem has a solution.Assume that there exists a solution to the given instance of the Finding k -Clique problem, i.e., a subset V ∗ ⊆ V (cid:48) forming a k -clique in H . Notice that after the removal of R ∗ = { v † } × ( V (cid:48) \ V ∗ ) , the evader v † isonly connected with the nodes in V ∗ . Since they form a clique in H , they also form a clique in G , henceaccording to the above observation the safety threshold is met. Notice also that | R ∗ | = n − k , hence thesolution is within the evader’s budget. We showed that if there exists a solution to the given instance of theFinding k -Clique problem, then there also exists a solution to the constructed instance of the Hiding Sourceby Modifying Edges problem.Assume that there exists a solution R ∗ to the constructed instance of the Hiding Source by ModifyingEdges problem. As observed above, the remaining nodes with which v † is connected, i.e., nodes V ∗ = { v i ∈ V (cid:48) : ( vs, v i ) / ∈ R ∗ } must form a clique in G , hence they also must form a clique in H . Since the budget ofthe evader is b = n − k , there are at least k nodes in V ∗ . We showed that if there exists a solution to theconstructed instance of the Hiding Source by Modifying Edges problem, then there also exists a solution tothe given instance of the Finding k -Clique problem.This concludes the proof. Theorem 10.
The problem of Hiding source by Modifying Edges is NP-complete given the Rumor sourcedetection algorithm. roof. The problem is trivially in NP, since after adding and removing the given sets of edges A ∗ and R ∗ , itis possible to compute the rumor centrality ranking of all nodes in G I in polynomial time.We will now prove that the problem is NP-hard. To this end, we will show a reduction from theNP-complete Finding a Hamiltonian Cycle problem. The decision version of this problem is defined by anetwork, H = ( V (cid:48) , E (cid:48) ) , where V (cid:48) = { v , . . . , v n } , and where the goal is to determine whether there exists aHamiltonian cycle in H , i.e., a cycle that visits each node exactly once.Let ( H ) be a given instance of the problem of Finding a Hamiltonian Cycle. We will now construct aninstance of the problem of Hiding Source by Modifying Edges.First, let us construct a network G = ( V, E ) where:• V = V (cid:48) ∪ { v † , w } ∪ (cid:83) i =1 (cid:83) nj =1 { a i,j } ,• E = E (cid:48) ∪ { ( v † , v ) } ∪ (cid:83) v i ∈ N H ( v ) { ( v i , w ) } ∪ (cid:83) a i,n ∈ V { ( a i,n , w ) } ∪ (cid:83) i =1 (cid:83) n − j =1 { ( a i,j , a i,j +1 ) } .Notice that the evader v † is connected only with v , while w is connected with all neighbors of v in H (wecan assume that v has at least two neighbors in H , as otherwise H definitely does not have a Hamiltoniancycle). An example of the construction of the network G is presented in Figure 27. 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒘𝒘 𝐻𝐻 𝐺𝐺 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 † 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , Figure 27: Construction of the network used in the proof of Theorem 10. Red dashed edges are allowed tobe removed.Now, consider the instance ( G, v † , I, σ, ω, b, (cid:98) A, (cid:98) R ) of the problem of Hiding Source by Modifying Edges,where:• G is the network we just constructed,• v † is the evader,• I = V , i.e., all nodes in G are infected,• σ is the Rumor source detection algorithm,• ω = 4 n + 1 is the safety threshold,• b = | E (cid:48) | + | N H ( v ) | − n ,• (cid:98) A = ∅ , i.e., none of the edges can be added,• (cid:98) R = E (cid:48) ∪ ( { w } × N H ( v )) , i.e., only edges belonging to the original set of edges in H and edges between w and neighbors of v in H can be removed.Since (cid:98) A = ∅ , for any solution to the constructed instance of the problem of Hiding Source by ModifyingEdges, we must have A ∗ = ∅ . Hence, we will omit mentions of A ∗ in the remainder of the proof, and we willassume that a solution consists of just R ∗ .Let d R denote distance between v and w in G after the removal of R , i.e., d R = d ( V,E \ R ) ( v , w ) . We willfirst prove the following lemma. 41 emma 2. A given R is a solution to the constructed instance of the problem of Hiding Source by ModifyingEdges if and only if d R = n .Proof of Lemma 2. We first show that if d R = n then R is a solution to the constructed instance of theHiding Source by Modifying Edges problem. Notice that d R = n implies that all nodes in V (cid:48) form a pathbetween v † and w (as presented in the example in Figure 28), as there are exactly n nodes in V (cid:48) , no othernodes can be part of the shortest path between v † and w , and if at least one additional edge was added tothis path, the distance between v and w would be smaller than n . 𝒘𝒘𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 † 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , 𝒂𝒂 , Figure 28: Network G from Figure 27 after the removal of R such that nodes in V (cid:48) form a path between v † and w .To remind the reader, the score assigned to a given node by the Rumor source detection algorithm is σ rumor ( v, G, I ) = | I | ! (cid:81) w ∈ I Θ vw where Θ vw is the size of the subtree of w in the BFS tree of G I rooted at v .Let C v ( d R ) denote (cid:81) w ∈ I \{ v } Θ vw in G where R was removed. Notice that greater C v ( d R ) implies lower σ rumor ( v, ( V, E \ R ) , I ) and vice versa, as we have σ rumor ( v, ( V, E \ R ) , I ) = | I | ! | V | C v ( d R ) .Let us now compute the values of C v ( n ) of all nodes in G , i.e., the value of C v when the removal of R caused the nodes in V (cid:48) the form a path between v † and w :• C v † ( n ) = (4 n + 1)(4 n ) . . . (3 n + 1) n ! = (4 n +1)! n ! (3 n )! , • C v i ( n ) = j !(4 n − j + 1)(4 n − j ) . . . (3 n + 1) n ! = j !(4 n − j +1)! n ! (3 n )! , where v i is j -th node on the path from v † to w ,• C w ( n ) = ( n + 1)! n ! , • C a i,j ( n ) = ( j − n − j + 2)(4 n − j + 1) . . . (3 n + 2)( n + 1)! n ! = ( j − n − j +2)!( n +1)! n ! (3 n +1)! . To give the reader a better understanding of how the values of C v ( n ) are computed, we will describe inmore detail the computation for C v † ( n ) ; values for other nodes are computed analogically. We root the BFStree of network G after the removal of R in node v † (since the network is already a tree, all of its edges arepart of the BFS tree). The subtree rooted at v is then of size n + 1 , as it contain all nodes of the networkother than v † . The subtree of the next node from V (cid:48) on the path from v † to w (in case of the example inFigure 28 it is v ) is of size n . The size of the subtree of each subsequent node on the path from v † to w is lower by one, until the subtree rooted at w , which is of size n + 1 . Based on similar reasoning, theproduct of sizes of subtrees on each branch consisting of nodes a i,j for a fixed i is n ! , and there are threesuch branches, which gives us n ! .To prove that R is a solution to the constructed instance of the problem of Hiding Source by ModifyingEdges, we now have to show that the safety threshold is met, i.e., that the evader has lower rumor centrality(greater C v ( n ) ) than all other nodes in G . Indeed, we have:• C v † ( n ) /C v i ( n ) = (4 n +1)! j !(4 n − j +1)! > , given that ≤ j ≤ n as v i is the j -th node on the path from v † to w ,• C v † ( n ) /C w ( n ) = (4 n +1)!(3 n )!( n +1)! > , C v † ( n ) /C a i,j ( n ) = (4 n +1)! n ! (3 n )! (3 n +1)!( j − n − j +2)!( n +1)! n ! = (4 n +1)!(3 n +1)( j − n − j +2)!( n +1) > , given that ≤ j ≤ n .We have shown that if d R = n then R is a solution to the constructed instance of the problem of HidingSource by Modifying Edges.We will now show that if R is a solution to the constructed instance of the problem of Hiding Sourceby Modifying Edges then d R = n . Assume to the contrary, that d R < n and R is a solution. Since R is asolution, then all nodes must have greater rumor centrality (and lower C v ) than v † , in particular we musthave C a , < C v † .We have that: C v † ( d R ) ≤ (4 n + 1)(4 n ) . . . (4 n + 1 − d R )( n − d R )! n ! = (4 n + 1)!( n − d R )! n ! (4 n − d R )! because there are d R nodes from V (cid:48) on the path from v † to w , while the other n − d R nodes have contributionat most ( n − d R )! to the product (notice that k ! is the maximal value for the product of sizes of k subtrees).Similarly, we also have that: C a , ( d R ) ≥ (4 n + 1)(4 n ) . . . (3 n + 2)( d R + 1)! n ! = (4 n + 1)!( d R + 1)! n ! (3 n + 1)! because the contribution of the nodes from V (cid:48) on the shortest path from w to v † is at least ( d R + 1)! , whilethe contribution of the other n − d R nodes in V (cid:48) is at least .To complete the proof of the lemma, we will now show that if d R < n then C v † ( d R ) ≤ C a , ( d R ) (hence,our assumption was false and R cannot be a solution). Let U L ( d R ) be the upper limit on C v † ( d R ) C a , ( d R ) , i.e.: C v † ( d R ) C a , ( d R ) ≤ U L ( d R ) . The formula of
U L ( d R ) , based on the equations in the two previous paragraphs, is: U L ( d R ) = (4 n + 1)!( n − d R )! n ! (4 n − d R )! (3 n + 1)!(4 n + 1)!( d R + 1)! n ! = ( n − d R )! n !(3 n + 1)!(4 n − d R )!( d R + 1)! . In particular:• For d R = n − we have that: U L ( n −
1) = 1! n !(3 n + 1)!(3 n + 1)! n ! = 1 . Hence C v † ( n − ≤ C a , ( n − , i.e., R is not a solution when d R = n − .• We also have that: U L ( d R + 1) U L ( d R ) = ( n − d R − n !(3 n + 1)!(4 n − d R − d R + 2)! (4 n − d R )!( d R + 1)!( n − d R )! n !(3 n + 1)! = 4 n − d R ( d R + 2)( n − d R ) . Notice that since d R < n and d R ≥ we have that: U L ( d R + 1) U L ( d R ) > n n − > . Hence
U L ( d R ) is increasing with d R , and for d R < n − we have that U L ( d R ) < U L ( n −
1) = 1 , which implies that C v † ( d R ) < C a , ( d R ) for d R < n − , i.e., R is not a solution when d R < n −
43e showed that if R is a solution to the constructed instance of the problem of Hiding Source by ModifyingEdges then d R = n .This concludes the proof of Lemma 2.Having proved Lemma 2, we now move back to proving Theorem 10. To this end, we will show that theconstructed instance of the problem of Hiding Source by Modifying Edges has a solution if and only if thegiven instance of the problem of Finding a Hamiltonian Cycle has a solution.Assume that there exists a solution to the given instance of the Finding a Hamiltonian Cycle problem,i.e., a set of edges E ∗ ⊆ E (cid:48) inducing a Hamiltonian cycle in H . Let v ∗ be one of the neighbors of v on thecycle. We will show that R ∗ = ( E (cid:48) \ E ∗ ) ∪{ ( v , v ∗ ) }∪ ( { w } × ( N H ( v ) \ { v ∗ } )) is a solution to the constructedinstance of the Hiding Source by Modifying Edges. By removing edges ( E (cid:48) \ E ∗ ) ∪ { ( v , v ∗ ) } from G , nodesin V (cid:48) now induce a path in G . Node v is connected with v † and with the neighbor on the cycle other than v ∗ , whereas v ∗ is connected with w . Notice also that since we removed the edges in { w } × ( N H ( v ) \ { v ∗ } ) from G , the node v ∗ is the only node in V (cid:48) connected to w . Finally, notice that the number of removededges is within the evader’s budget. Hence, we have that v † is connected to w with a path of nodes from V (cid:48) (a situation presented in Figure 28), which implies that d R ∗ = n . Based on Lemma 2, we showed that R ∗ isa solution to the constructed instance of the Hiding Source by Modifying Edges problem.Assume that there exists a solution R ∗ to the constructed instance of the problem of Hiding Source byModifying Edges. Based on Lemma 2 it implies that d R ∗ = n , i.e., v † is connected to w with a path of nodesfrom V (cid:48) . Let v ∗ be the node from V (cid:48) directly connected to w , and let E ∗ be the set of edges induced by V (cid:48) in E \ R ∗ (edges that form a path from v † to w after the removal of R ∗ ). Since all edges in G between thenodes in V (cid:48) exist also in H , the set E ∗ induces a path connecting all the nodes in H . Moreover, since v is the only node in V (cid:48) connected to v † in G , it has to be the first node on the path formed by E ∗ , with v ∗ being the last node. However, because of the way we constructed G , v ∗ is a neighbor of v in H . Therefore,the set E ∗ ∪ { ( v , v ∗ ) } induces a Hamiltonian cycle in H . We showed that if there exists a solution to theconstructed instance of the problem of Hiding Source by Modifying Edges, then there also exists a solutionto the given instance of the problem of Finding a Hamiltonian Cycle.This concludes the proof. Theorem 11.
The problem of Hiding source by Modifying Edges is NP-complete given the Random Walksource detection algorithm.Proof.
The problem is trivially in NP, since after adding and removing the given sets of edges A ∗ and R ∗ , itis possible to compute the Random Walk source detection algorithm scores of all nodes in G I in polynomialtime.We will now prove that the problem is NP-hard. To this end, we will show a reduction from theNP-complete Finding a Hamiltonian Cycle problem. The decision version of this problem is defined by anetwork, H = ( V (cid:48) , E (cid:48) ) , where V (cid:48) = { v , . . . , v n } , and where the goal is to determine whether there exists aHamiltonian cycle in H , i.e., a cycle that visits each node exactly once.Let ( H ) be a given instance of the problem of Finding a Hamiltonian Cycle. We will now construct aninstance of the problem of Hiding Source by Modifying Edges.First, let us construct a network G = ( V, E ) where:• V = V (cid:48) ∪ { v † , w, u } ,• E = E (cid:48) ∪ { ( v † , v ) , ( w, u ) } ∪ (cid:83) v i ∈ N H ( v ) { ( v i , w ) } .Notice that the evader v † is connected only with v , while w is connected with all neighbors of v in H (wecan assume that v has at least two neighbors in H , as otherwise H definitely does not have a Hamiltoniancycle). An example of the construction of the network G is presented in Figure 29.Now, consider the instance ( G, v † , I, σ, ω, b, (cid:98) A, (cid:98) R ) of the problem of Hiding Source by Modifying Edges,where:• G is the network we just constructed, 44 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒘𝒘 𝐻𝐻 𝐺𝐺 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 † 𝒖𝒖 Figure 29: Construction of the network used in the proof of Theorem 11. Red dashed edges are allowed tobe removed.• v † is the evader,• I = V , i.e., all nodes in G are infected,• σ is the Random Walk source detection algorithm,• ω = n + 1 is the safety threshold,• b = | E (cid:48) | + | N H ( v ) | − n ,• (cid:98) A = ∅ , i.e., none of the edges can be added,• (cid:98) R = E (cid:48) ∪ ( { w } × N H ( v )) , i.e., only edges belonging to the original set of edges in H and edges between w and neighbors of v in H can be removed.Moreover, the Random Walk source detection algorithm needs to be parameterized with the model ofspreading. Let the algorithm be parameterized with the model in which the probability of propagation is p = and the time of diffusion is T = n + 1 .Since (cid:98) A = ∅ , for any solution to the constructed instance of the problem of Hiding Source by ModifyingEdges, we must have A ∗ = ∅ . Hence, we will omit the mentions of A ∗ in the remainder of the proof, and wewill assume that a solution consists just of R ∗ .We will first prove the following lemma. Lemma 3.
If all nodes of the network are infected then the Random Walk source detection algorithm assignsto a given node v ∈ V score if ∀ w ∈ I d G ( v, w ) ≤ T and otherwise.Proof of Lemma 3. To remind the reader, the formula of the Random Walk source detection algorithm is: σ rwalk ( v, G, I ) = (cid:40) φ ( v ) if ∀ w ∈ I d G ( v, w ) ≤ T otherwisewhere T is the number of rounds in the SI model and φ is defined using the formula: φ t ( v ) = (cid:40) if t = T (1 − p ) φ t +1 ( v ) + (cid:80) w ∈ N ( v ) ∩ I p | N ( v ) | φ t +1 ( w ) otherwisewhere p is the probability of infection in the SI model.Hence, the fact that the node v gets assigned a score of if ¬∀ w ∈ I d G ( v, w ) ≤ T follows from the formulaof σ rwalk . We have to prove that if all nodes are infected and ∀ w ∈ I d G ( v, w ) ≤ T then v is assigned a scoreof .Assume that I = V and ∀ v ∈ I φ t +1 ( v ) = 1 . We have that: φ t ( v ) = (1 − p ) + | N ( v ) | p | N ( v ) | = (1 − p ) + p = 1 .
45e have shown that if I = V and ∀ v ∈ I φ t +1 ( v ) = 1 then ∀ v ∈ I φ t ( v ) = 1 . Since we also have that ∀ v ∈ I φ T ( v ) = 1 then by induction we have that ∀ v ∈ I φ ( v ) = 1 This concludes the proof of Lemma 3.Since all nodes in G are infected, the only way some of them can have a greater Random Walk algorithmscore than the evader (and thus contribute to the safety threshold) is if ∃ w ∈ I d G ( v † , w ) > T . Notice thatsince T = n + 1 and the network G has n + 3 nodes, the only way in which the distance between v † andsome node is n + 2 or greater is if the nodes in G form a path with v † on one of its ends. In that case v † and the node on the other end of the path have scores , while all other n + 1 nodes have scores , hencethe safety threshold ω = n + 1 is satisfied.We will now show that the constructed instance of the problem of Hiding Source by Modifying Edges hasa solution if and only if the given instance of the problem of Finding a Hamiltonian Cycle has a solution.Assume that there exists a solution to the given instance of the problem of Finding a Hamiltonian Cycle,i.e., a set of edges E ∗ ⊆ E (cid:48) inducing a Hamiltonian cycle in H . Let v ∗ be one of the neighbors of v onthe cycle. We will show that R ∗ = ( E (cid:48) \ E ∗ ) ∪ { ( v , v ∗ ) } ∪ ( { w } × ( N H ( v ) \ { v ∗ } )) (notice that this set hasexactly b = | E (cid:48) | + | N H ( v ) | − n edges) is a solution to the constructed instance of the problem of HidingSource by Modifying Edges. By removing the edges ( E (cid:48) \ E ∗ ) ∪ { ( v , v ∗ ) } from G , the nodes in V (cid:48) nowinduce a path in G . Node v is connected with v † and with the neighbor on the cycle other than v ∗ , whereas v ∗ is connected with w . Notice also that since we removed the edges in { w } × ( N H ( v ) \ { v ∗ } ) from G , v ∗ is the only node in V (cid:48) connected with w . Finally, notice the number of removed edges is within the evader’sbudget. Hence, we have that after the removal of R ∗ the network G is a path with v † on one of the ends(and u on the other end). Based on Lemma 3 and the previous observations, we showed that R ∗ is a solutionto the constructed instance of the problem of Hiding Source by Modifying Edges.Assume that there exists a solution R ∗ to the constructed instance of the problem of Hiding Source byModifying Edges. Based on Lemma 3 and the previous observations, it implies that after the removal of R ∗ the network G is a path with v † on one of the ends. Notice that u has to be the other end of the path,as it has only one neighbor, namely w . Hence, the nodes in the path between v † and w must be all thenodes in V (cid:48) , connected with the edges from E (cid:48) (as we did not add any other edges between the nodes in V (cid:48) ). Moreover, the neighbor of v † on the path must be v , while the neighbor of w (other than u ) on thepath has to be one of the neighbors of v in H . If we denote this neighbor of w by v ∗ , then the set of edgesinduced by V (cid:48) in ( V, E \ R ∗ ) with the addition of ( v , v ∗ ) induces a Hamiltonian cycle in H . We showed thatif there exists a solution to the constructed instance of the problem of Hiding Source by Modifying Edges,then there also exists a solution to the given instance of the problem of Finding a Hamiltonian Cycle.This concludes the proof. Theorem 12.
The problem of Hiding Source by Modifying Edges is NP-complete given the Monte Carlosource detection algorithm.Proof.
The problem is trivially in NP, since after adding and removing the given sets of edges A ∗ and R ∗ , itis possible to generate Monte Carlo samples and compute the ranking of all nodes in G I in polynomial time.We will now prove that the problem is NP-hard. To this end, we will show a reduction from the NP-complete Dominating Set problem. The decision version of this problem is defined by a network, H =( V (cid:48) , E (cid:48) ) , where V (cid:48) = { v , . . . , v n } , and a constant k ∈ N , where the goal is to determine whether there exist V ∗ ⊆ V (cid:48) such that | V ∗ | = k and every node outside V ∗ has at least one neighbor in V ∗ , i.e., ∀ v ∈ V (cid:48) \ V ∗ N H ( v ) ∩ V ∗ (cid:54) = ∅ .Let ( H, k ) be a given instance of the Dominating Set problem. Let us assume that k < n − , all otherinstances can be easily solved in polynomial time. We will now construct an instance of the Hiding Sourceby Modifying Edges problem.First, let us construct a network G = ( V, E ) where:• V = V (cid:48) ∪ { v † , u, w, x } ∪ (cid:83) n − i =1 { a i } ,• E = E (cid:48) ∪ { ( v † , u ) , ( u, w ) , ( w, x ) } ∪ (cid:83) n − i =1 { ( a i , x ) } .46n example of the construction of the network G is presented in Figure 30. 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 † 𝐻𝐻 𝐺𝐺 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒗𝒗 𝒘𝒘 𝒙𝒙𝒂𝒂 𝒂𝒂 𝒖𝒖 Figure 30: Construction of the network used in the proof of Theorem 12. Green dotted edges are allowedto be added. Infected nodes in G are highlighted in black.Now, consider the instance ( G, v † , I, σ, ω, b, (cid:98) A, (cid:98) R ) of the problem of Hiding Source by Modifying Edges,where:• G is the network we just constructed,• v † is the evader,• I = { v † , u, w } ,• σ is the Monte Carlo source detection algorithm,• ω = 2 is the safety threshold,• b = k , where k is the size of the dominating set from the Dominating Set problem instance,• (cid:98) A = { v † } × V (cid:48) , i.e., only edges between v † and the members of V (cid:48) can be added,• (cid:98) R = ∅ , i.e., none of the edges can be removed.Since (cid:98) R = ∅ , for any solution to the constructed instance of the Hiding Source by Modifying Edgesproblem we must have R ∗ = ∅ . Hence, we will omit any mentions of R ∗ in the remainder of the proof, andwe will assume that a solution consists just of A ∗ .Moreover, the Monte Carlo source detection algorithm needs to be parameterized with the model ofspreading. Let the algorithm be parameterized with the model where the probability of propagation is p = 1 and the time of diffusion is T = 2 . Notice that since the model with p = 1 is deterministic, all Monte Carlosamples starting from a given node always give the same result, hence the formula of the Monte Carlo sourcedetection algorithm is in this case: σ mcarlo ( v, G, I ) = exp − (cid:16) | I ∩ I v || I ∪ I v | − (cid:17) a where I v is the set of infected nodes in the Monte Carlo sample starting the diffusion at v . Moreover,notice that the position of a given node v ∈ I in the ranking generated by the Monte Carlo source detectionalgorithm depends solely on the value of C v = | I ∩ I v || I ∪ I v | (in particular, the node with the greatest value of C v is selected as the source of diffusion).Let V (cid:48) A denote the set of nodes from V (cid:48) connected with v † after the addition of A , i.e., V (cid:48) A = { v ∈ V (cid:48) :( v † , v ) ∈ A } . Let us compute the values of C v for the nodes in I (notice that these are the only nodes thatcan be selected as the source of diffusion by the Monte Carlo algorithm) after the addition of an arbitrary A : 47 C v † = | A | +3+ m A where m A is the number of nodes in V (cid:48) /V (cid:48) A with at least one neighbor in V (cid:48) A ,• C u = | A | +4 , • C w = n +2 . Since the safety threshold is ω = 2 , both u and w need to have greater value of C v than v † . It is easyto see that C w > C v † if and only if | A | + m A = n . Notice also that if | A | + m A = n then also m A > ,since | A | ≤ k (as the budget of the evader is k ) and since we assumed that k < n − . Therefore, the safetythreshold is met if and only if | A | + m A = n .We will now show that the constructed instance of the Hiding Source by Modifying Edges problem hasa solution if and only if the given instance of the Dominating Set problem has a solution.Assume that there exists a solution to the given instance of the Dominating Set problem, i.e., a subset V ∗ ⊆ V (cid:48) of size k such that all other nodes have a neighbor in V ∗ . After adding to G the set A ∗ = { v † } × V ∗ we have that | A ∗ | = k and m A ∗ = n − k , which implies that the safety threshold is met. We showed that ifthere exists a solution to the given instance of the Dominating Set problem, then there also exists a solutionto the constructed instance of the Hiding Source by Modifying Edges problem.Assume that there exists a solution A ∗ to the constructed instance of the Hiding Source by ModifyingEdges problem. As shown above, we must have | A ∗ | + m A ∗ = n . Since the budget of the evader is k , thesize of A ∗ is at most k . Therefore V (cid:48) A ∗ is a dominating set in H of size at most k (we can add k − | A ∗ | arbitrarily chosen elements to obtain a set of size exactly k ). We showed that if there exists a solution tothe constructed instance of the Hiding Source by Modifying Edges problem, then there also exists a solutionto the given instance of the Dominating Set problem.This concludes the proof. 48 Concealment With and Without Strategic Manipulation
In the main article, we discussed the concept of disentangling the two different notions of hiding—oneprovided by the structure of the network itself, and the other being the result of strategic manipulation.Figure 2 in the main article shows the results of this analysis for the Eigenvector source detection algorithm.Here, we present analogical results for all source detection algorithms considered in the study; see Figures 31to 36.As can be seen, the scale-free structure of the Barabási-Albert networks tends to keep the evader well-hidden (and even more so when the size and density of the network increase), whereas the identity of theevader is much more exposed in the other network structures. Next, we comment on the susceptibility ofdifferent network structures to strategic manipulation, starting with the addition of nodes, following by themodification of edges. In particular, when hiding by adding nodes, Erdős-Rényi and Watts-Strogatz networksdemonstrate less resilience to manipulation compared to their Barabási-Albert counterpart. Moreover, theeffectiveness of hiding seems to be affected by the average degree of the network in a much more significantway than by the number of nodes in the network; the effect can be either positive or negative, depending onthe source detection algorithm. Finally, let us comment on the susceptibility to strategic manipulation whenthe evader modifies edges. In this case, for most source detection algorithms, hiding the source of diffusion ismore effective in scale-free networks, and the effectiveness increases in larger and denser scale-free networks.In contrast, no clear patterns were found for the other two types of network structure.49 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 200010 C h a n g e i n r a n k A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000
BCA B e f o r e h i d i n g A dd i n g n o d e s M o d i f y i n g e d g e s A v e r a g e r a n k Figure 31: The same as Figure 2 in the main article, but for Degree source detection algorithm instead ofEigenvector. 50 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 200010010 C h a n g e i n r a n k A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000
BCA B e f o r e h i d i n g A dd i n g n o d e s M o d i f y i n g e d g e s A v e r a g e r a n k Figure 32: The same as Figure 2 in the main article, but for Closeness source detection algorithm insteadof Eigenvector. 51 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 200010010 C h a n g e i n r a n k A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000
BCA B e f o r e h i d i n g A dd i n g n o d e s M o d i f y i n g e d g e s A v e r a g e r a n k Figure 33: The same as Figure 2 in the main article, but for Rumor source detection algorithm instead ofEigenvector. White hexagons imply that all hiding strategies increase the visibility of the evader.52 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000101 C h a n g e i n r a n k A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000
BCA B e f o r e h i d i n g A dd i n g n o d e s M o d i f y i n g e d g e s A v e r a g e r a n k Figure 34: The same as Figure 2 in the main article, but for Random walk source detection algorithminstead of Eigenvector. White hexagons imply that all hiding strategies increase the visibility of the evader.53 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 200010010 C h a n g e i n r a n k A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000
BCA B e f o r e h i d i n g A dd i n g n o d e s M o d i f y i n g e d g e s A v e r a g e r a n k Figure 35: The same as Figure 2 in the main article, but for Monte Carlo source detection algorithm insteadof Eigenvector. 54 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 200010 C h a n g e i n r a n k A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 20004681210 A v e r a g e d e g r ee Size1000 2000
BCA B e f o r e h i d i n g A dd i n g n o d e s M o d i f y i n g e d g e s A v e r a g e r a n k Figure 36: The same as Figure 2 in the main article, but for Betweenness source detection algorithm insteadof Eigenvector. 55
Hiding by Adding Bots—Additional Results
Examples of applying different heuristics that add nodes to the network are presented in Figure 37. In theexamples each bot is being connected with only one supporter.
Hub Degree Random S uppo r t e r s on l y S uppo r t e r s + c li qu e Figure 37: Examples of heuristics that add new nodes to the network. Red squared nodes represent bots(new nodes), while rounded nodes represent network members. The size of each rounded node correspondsto its degree. Each column corresponds to a different way of selecting supporters to be connected with thebots. The first row shows examples of heuristics that connects bots with supporters only, while the secondrow shows examples of heuristics that additionally connect all bots into a clique.Next, Figures 38 and 39 present additional results; the first of these figures shows the change in theevader’s ranking during the hiding process, while the second figure shows the evader’s ranking after the hidingprocess. Several observations can be made based on these results. Regarding the different network structures,the source detection algorithms are on average less effective in the preferential-attachment networks generatedusing the Barabási-Albert model (i.e., the evader’s position in the ranking is usually low to begin with).Regarding the different source detection algorithms, Betweenness and Rumor are on average the most resilientto hiding attempts (i.e., the average drop in the evader’s ranking is the smallest), whereas Monte Carlo,Eigenvector, and Random walk are usually the least resilient. Regarding the selection of supporters toconnect the bots with, it is usually beneficial to link bots with many different supporters (using either therandom or degree heuristics), rather than connecting them all to the same set of supporters (using the hubheuristic), with the only exception being the Monte Carlo source detection algorithm. As for the questionof whether bots should be connected into a clique, our simulations show that this would indeed result insuperior hiding of the evader. Moreover, in the majority of cases, connecting each bot to three supportersproves to be more effective than connecting the bot to just a single supporter.56 arab´asi-Albert Erd˝os-R´enyi Watts-Strogatz D e g r ee Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng E i g e n vec t o r Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng C l o se n ess Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng R u m o r Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng M on t e C a r l o Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng R a ndo m w a l k Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng B e t w ee nn ess Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Figure 38: Results of simulations with hiding the source of diffusion by adding bots to randomly generatednetworks with , nodes and an average degree of . The y-axis corresponds to the ranking of theevader according to the source detection algorithm (greater values indicate more efficient hiding), while thex-axis corresponds to the number of nodes added to the network. Each color corresponds to a differentheuristic. Solid lines correspond to cases where each bot is connected to a single supporter, while dashedlines correspond to cases where each bot is connected to three supporters. Shaded areas represent confidence intervals. 57 arab´asi-Albert Erd ˝os-R´enyi Watts-Strogatz li n ka dd e d BetweennessRumorDegreeClosenessRandomWalkEigenvectorMonteCarlo D eg r ee c li que R ando m c li que H ub c li que H ub D eg r ee R ando m BetweennessRumorDegreeRandomWalkClosenessEigenvectorMonteCarlo D eg r ee c li que R ando m c li que H ub c li que H ub R ando m D eg r ee BetweennessRumorDegreeClosenessEigenvectorMonteCarloRandomWalk D eg r ee c li que R ando m c li que H ub c li que H ub R ando m D eg r ee li n ksa dd e d BetweennessRumorDegreeClosenessMonteCarloEigenvectorRandomWalk D eg r ee c li que R ando m c li que H ub c li que H ub R ando m D eg r ee BetweennessRumorMonteCarloClosenessDegreeEigenvectorRandomWalk R ando m c li que D eg r ee c li que H ub c li que H ub R ando m D eg r ee BetweennessMonteCarloRumorDegreeClosenessEigenvectorRandomWalk R ando m c li que D eg r ee c li que H ub c li que H ub R ando m D eg r ee
25 50 75 100 125
Figure 39: Results of simulations with hiding the source of diffusion by adding nodes to randomly generatednetworks with , nodes with an average degree of . The y-axis of each heatmap corresponds to differentsource detection algorithms, whereas the x-axis corresponds to a different heuristic. The value in each cellindicates the change in the evader’s ranking according to the source detection algorithm after adding botsto the network using the heuristic. Rows and columns are sorted by average value.58 Hiding by Modifying Edges—Additional Results
Examples of applying different heuristics that modify edges of the network are presented in Figure 40.
Max degree Min degree Random A dd i ng R e m o v i ng Figure 40: Examples of heuristics that modify edges of the network. The red node in each network representsthe evader. The size of each node corresponds to its degree. Green dashed edges represent newly addedconnections, while dotted red edges represent newly removed connections. The first row shows examples ofheuristics that add edges to the network, while the second row shows examples of heuristics that removeedges from the network. Each column corresponds to a different way of selecting the nodes that will beconnected to, or disconnected from, the evader.Figures 41 to 44 present the results of our simulations on networks with , nodes, with Figures 41and 42 showing the results for the pure strategies, and Figures 43 and 44 showing the results for the mixedstrategies. Our results indicate that removing existing edges from the network is significantly more effectivein hiding the source of diffusion than adding new edges. In fact, adding new connections incident with theevader exposes the evader even more to the source detection algorithms in the vast majority of cases. Mixingbetween the two types of strategies typically provides worse performance in terms of hiding than simplyrunning the removal component of the strategy. Regarding the choice of the neighbor to disconnect theevader from, selecting the neighbors with the greatest degree provides the best hiding to the evader, followedby neighbors selected at random, with low-degree neighbors being the worst choice. Nevertheless, in manycases all three removal heuristics provide very similar performance.59 arab´asi-Albert Erd˝os-R´enyi Watts-Strogatz D e g r ee Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng E i g e n vec t o r Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng C l o se n ess Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng R u m o r Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng M on t e C a r l o Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng R a ndo m w a l k Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng B e t w ee nn ess Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Adding RemovingMax degree Min degree Random
Figure 41: Results of hiding the source of diffusion by modifying edges in networks consisting of , nodes with an average degree of . The y-axis represents the evader evader’s ranking according to the sourcedetection algorithm (greater value indicate more effective hiding); the x-axis corresponds to the number ofedges added to, or removed from, the network. Each color corresponds to a different way of choosing edges,while each line type (dashed or solid) corresponds to either adding or removing. Shaded areas represent confidence intervals. 60 arab´asi-Albert Erd ˝os-R´enyi Watts-Strogatz -0.715.1427.16 -1.378.42 -3.26-4.213.6214.16 3.61 -1.23 -1.18-3.4715.12 -2.2 -0.767.4119.13 -4.6412.49 -0.529.71 -1.244.7971.1829.35 -3.0615.15 -5.73-1.17-3.3610.0159.767.74 -1.2211.03 0.1846.82 -1.01-5.2521.85 -1.28 RumorBetweennessRandomWalkEigenvectorDegreeClosenessMonteCarlo R e m o v i ng m a x deg r ee R e m o v i ng r ando m R e m o v i ng m i n deg r ee A dd i ng m i n deg r ee A dd i ng r ando m A dd i ng m a x deg r ee -0.386.4910.63 -0.87-0.4-0.61 -0.61-0.39-0.4-0.4516.63 9.549.31 -0.4-0.856.4915.99 0.29.29 0.099.31-0.41 6.61-0.42 -0.46.85 -0.413.86 -0.37 -0.4-0.614.02 -0.81-0.46.18 -0.41-0.363.976.49 15.67 0.2310.13 RandomWalkBetweennessEigenvectorRumorDegreeClosenessMonteCarlo R e m o v i ng m a x deg r ee R e m o v i ng r ando m R e m o v i ng m i n deg r ee A dd i ng r ando m A dd i ng m i n deg r ee A dd i ng m a x deg r ee -0.87-0.4312.58 -0.41-0.266.8 -0.356.92 -0.624.09 2.42 -0.41-1.656.876.8512.55 4.09 -0.44-0.44 -0.42-0.41-1.65 -0.33-0.876.88 -0.354.26 -0.634.34 12.5 -0.43-0.872.42 6.874.29 -0.624.09-1.65 -0.35-0.26 -0.442.4 RandomWalkBetweennessEigenvectorRumorClosenessDegreeMonteCarlo R e m o v i ng m a x deg r ee R e m o v i ng r ando m R e m o v i ng m i n deg r ee A dd i ng m a x deg r ee A dd i ng r ando m A dd i ng m i n deg r ee Figure 42: Results of hiding the source of diffusion by modifying edges in networks consisting of , nodeswith an average degree of . In each heatmap, rows correspond to different source detection algorithms,while columns correspond to different heuristics. The value in each cell indicates the change in the evader’sranking according to the source detection algorithm as a result of adding or removing edges to the network,depending on the heuristic. Positive values indicate that the evader became more hidden, with greater valuesindicated a more effective disguise. In contrast, negative values indicate that the evader became less hidden.Rows and columns are sorted by average value. 61 arab´asi-Albert Erd ˝os-R´enyi Watts-Strogatz D e g r ee -0.44 -0.41-0.39-1.17 -0.46 -0.4115.1415.1215.15 -1.22-1.23-0.37-0.35 -0.45 -0.42 Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m -0.4-0.199.29 -0.49.31 -0.2-0.19 -0.19 -0.4-0.21-0.29.31 -0.21-0.19-0.18 Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m -0.14 -0.35-0.13-0.12-0.13 -0.1-0.13-0.136.88 -0.35-0.13-0.35 -0.126.876.87 Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m E i g e n vec t o r Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m -0.1-0.616.85 -0.17-0.31 -0.27-0.34-0.4 -0.236.61 -0.616.18 -0.61-0.34-0.25 Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m -0.52 -0.43 -0.874.26 -0.87-0.424.29 -0.36 -0.484.34 -0.87 -0.48-0.48 -0.41-0.49 Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m C l o se n ess Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m -0.390.5510.13 0.329.54 0.120.03 0.3110.63 0.710.93-0.37-0.4 0.480.19 Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m R u m o r -1.182.654.79 4.4110.3814.17-4.2 5.70.182.232.48 3.243.619.71 3.53 Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m -0.432.584.09 2.832.47 2.774.094.09 -0.43 2.47 2.82.582.38-0.422.58 Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m Figure 43: Results of hiding the source of diffusion by modifying edges in networks consisting of , nodeswith an average degree of . In each heatmap, rows correspond to different removal heuristics, while columnscorrespond to different addition heuristics. The first column and the last row represent pure strategies, whilethe remaining cells represent mixed strategies. The value in each cell indicates the change in the evader’sranking according to the source detection algorithm after adding or removing edges, depending on theheuristic. Positive values indicate that the evader became more hidden, with greater values indicated amore effective disguise. In contrast, negative values indicate that the evader became less hidden. Rows andcolumns are sorted by average value. 62 arab´asi-Albert Erd ˝os-R´enyi Watts-Strogatz M on t e C a r l o Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m R a ndo m w a l k -1.8812.49 -1.14-2.15 -1.8 -3.3610.0114.16 -2.21-3.06 -1.28 -1.66-3.47-1.71 -1.43 Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m -0.87 -0.61-0.27-0.56-0.45-0.64 -0.81-0.15-0.45-0.42 -0.29 -0.85-0.41 -0.49-0.82 Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m -0.9-1.65 -0.63-0.77-0.62-0.76 -0.73-0.66-0.69-0.62 -0.83-1.65 -0.86-0.67-1.65 Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m -30-20-10010 B e t w ee nn ess Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m Removing max degreeRemoving min degreeRemoving random A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m -1001020 Figure 44: The same as Figure 43, except for the difference in the source detection algorithms beingconsidered in the analysis. 63
Comparison of Effectiveness of Adding Bots and Modifying Edges
In the main article, we compared the effectiveness of hiding the source of diffusion using two different typesof network modifications—adding nodes and modifying edges. Figure 3 in the main article depicts the resultsof this analysis for the Eigenvector source detection algorithm. Here, we present analogical results for allother source detection algorithms considered in the study; see Figure 45.As can be seen, the results for most source detection algorithm exhibit trends that are similar to thoseobserved given the Eigenvector algorithm in the main article. More specifically, in the vast majority of cases,modifying one edge is significantly more effective than adding a single bot to the network (as indicated byvalues greater than in the heatmaps). This tendency is particularly strong in the networks generated usingthe Barabási-Albert model, where for some networks it takes more than a hundred bots to achieve the sameeffect as one edge. In contrast, in many networks generated using the Watts-Strogatz model, the additionof bots is more effective than modifying edges. 64 e g r ee A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 2 E ff e c t i v e n e ss C l o s e n e ss A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 101 E ff e c t i v e n e ss R u m o r A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 101 E ff e c t i v e n e ss M o n t e C a r l o A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 101 E ff e c t i v e n e ss R a nd o m w a l k A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 101 E ff e c t i v e n e ss B e t w ee nn e ss A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 4681210 A v e r a g e d e g r ee Size1000 2000 101 E ff e c t i v e n e ss Figure 45: The same as Figure 3 in the main article, except for the difference in the source detectionalgorithms being considered in the analysis. For cells marked with minuses, at least one type of heuristic(adding bots or modifying edges) does not reduce the evader’s ranking.65
Hiding in Large Networks
All the experiments presented thus far in the Supplementary Materials consider only , nodes. Next, weevaluate the effectiveness of hiding the source of diffusion in much larger networks. Specifically, we considerrandom networks with , nodes generated using the Barabási-Albert, the Erdős-Rényi and the Watts-Strogatz models. As mentioned in the main article, due to the excessive amount of computations needed tohandle such networks, we approximate the evader’s ranking. The approximation is done by computing theranking of the evader not among all nodes, but rather among , nodes, consisting of the , infectednodes with the greatest degrees and another , infected nodes chosen uniformly at random from theremaining ones. Furthermore, in our approximation we do not consider the Betweenness and Random walksource detection algorithms, since their ranking cannot be efficiently computed for just a selected subset ofnodes.Figures from 46 to 49 present the results of our simulations. Generally, results show similar patterns tothese computed for smaller networks. For networks generated using the Barabási-Albert model, the evaderis relatively well-hidden even without executing any heuristics, whereas in other types of networks the evaderis usually exposed at the beginning of the process, usually occupying the top position in the ranking. Whatis more, unlike the case with the Barabási-Albert model, applying the heuristics given the other networkmodels has a negligible effect, with the evader’s ranking decreasing by only a few positions in most cases.66 arab´asi-Albert Erd ˝os-R´enyi Watts-Strogatz D e g r ee Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng E i g e n vec t o r Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng C l o se n ess Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng R u m o r Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng M on t e C a r l o Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Degree Degree clique Hub Hub clique Random Random clique3 supporters
Figure 46: The same as Figure 38, but for networks with , nodes (instead of , ) and only for thecase where supporters are connected to each bot. 67 arab´asi-Albert Erd ˝os-R´enyi Watts-Strogatz D e g r ee Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng E i g e n vec t o r Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng C l o se n ess Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng R u m o r Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng M on t e C a r l o Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Adding RemovingMax degree Min degree Random
Figure 47: The same as Figure 41, but for networks with , nodes instead of , nodes.68 arab´asi-Albert Erd ˝os-R´enyi Watts-Strogatz DegreeRumorClosenessEigenvectorMonteCarlo D eg r ee c li que R ando m c li que H ub c li que H ub R ando m D eg r ee MonteCarloRumorClosenessEigenvectorDegree D eg r ee c li que R ando m c li que R ando m D eg r ee H ub H ub c li que MonteCarloRumorClosenessEigenvectorDegree R ando m c li que D eg r ee c li que R ando m D eg r ee H ub H ub c li que
10 20 30
Figure 48: The same as Figure 39, but for networks with , nodes (instead of , nodes) and onlyfor the case in which supporters are connected to each bot. Barab´asi-Albert Erd ˝os-R´enyi Watts-Strogatz
RumorDegreeEigenvectorMonteCarloCloseness R e m o v i ng m a x deg r ee R e m o v i ng r ando m R e m o v i ng m i n deg r ee A dd i ng m i n deg r ee A dd i ng r ando m A dd i ng m a x deg r ee EigenvectorRumorMonteCarloDegreeCloseness R e m o v i ng m a x deg r ee R e m o v i ng r ando m R e m o v i ng m i n deg r ee A dd i ng m a x deg r ee A dd i ng r ando m A dd i ng m i n deg r ee EigenvectorRumorMonteCarloClosenessDegree R e m o v i ng m a x deg r ee R e m o v i ng r ando m R e m o v i ng m i n deg r ee A dd i ng m a x deg r ee A dd i ng m i n deg r ee A dd i ng r ando m Figure 49: The same as Figure 42, but for networks with , nodes instead of , nodes.69 Hiding in Real-Life Networks
We now evaluate the effectiveness of heuristics hiding the source of diffusion in real-life networks. In partic-ular, we consider the following networks:•
St Lucia [35]—colocation network collected at the St Lucia campus of the University of Queenslandusing the WiFi network. The network consists of 302 nodes and 1149 edges. Nodes represent students,while an edge between any two students indicates that they were in the same location at the sametime.•
Facebook [25]—fragment of the Facebook network, an ego-network of a student of an Americanuniversity. The network consists of 333 nodes and 2523 edges. Nodes represent Facebook users, whilean edge between any two users indicates that they are friends on Facebook.•
Copenhagen [1]—retweet network of the tweets regarding the United Nations Climate Change con-ference held in Copenhagen in December 2009. The network consists of 761 nodes and 1029 edges.Nodes represent Twitter users, while an edge between two users indicates that at least one of themresponded to a tweet by the other.•
Gnutella [29]—Gnutella peer-to-peer file sharing network snapshot from August 8, 2002. The networkconsists of 6299 nodes and 20776 edges. Nodes represent hosts, while an edge between two nodesindicates that the two hosts exchanged a file.•
PGP [7]—the network of users of the Pretty-Good-Privacy (PGP) algorithm for secure informationexchange from 2004. The network consists of 10680 nodes and 24316 edges. Nodes represent users ofthe algorithm, while an edge between two nodes indicates that they mutually signed their public keyswithin the protocol (i.e., they formed a trust relationship).•
Prostitution [30]—the giant component of a sexual contacts network between escorts and customersfrom a Brazilian online community. The network consists of 15810 nodes and 38540 edges. A noderepresents either an escort or a customers, while an edge between an escort and a customer indicatesthat the former reported a sexual contact with the latter.The experimental procedure for the first three (smaller) real-life networks remains the same as for randomnetworks with , nodes in the main article, whereas the procedure for the last three (larger) real-lifenetworks is the same as for random networks with , nodes.The results of our simulations for smaller networks are presented in Figures from 50 to 53, while theresults for smaller networks are presented in Figures from 54 to 57. Regarding the smaller networks, resultsseem to be largely consistent with observation made for the random networks. One noticeable difference isthe reduced effectiveness of heuristics adding nodes against rumor source detection algorithm in the St Luciaand the Facebook networks. It might be caused by the relatively large density of these two networks, causingmost of the BFS trees computed by the rumor algorithm to be star-like. In case of the larger networks, theevader is hidden even without any strategic manipulations to a greater degree than in randomly generatednetworks, suggesting that correctly identifying the source of diffusion in the real world might be even harderthan our simulations on synthetic data indicate. 70 t Lucia Facebook Copenhagen D e g r ee Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng E i g e n vec t o r Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng C l o se n ess Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng R u m o r Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng M on t e C a r l o Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng R a ndo m w a l k Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng B e t w ee nn ess Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Figure 50: The same as Figure 38, but for small real-life networks instead of randomly generated networks.71 t Lucia Facebook Copenhagen D e g r ee Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng E i g e n vec t o r Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng C l o se n ess Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng R u m o r Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng M on t e C a r l o Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng R a ndo m w a l k Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng B e t w ee nn ess Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Adding RemovingMax degree Min degree Random
Figure 51: The same as Figure 41, but for small real-life networks instead of randomly generated networks.72 t Lucia Facebook Copenhagen
RumorBetweennessMonteCarloClosenessDegreeRandomWalkEigenvector D eg r ee c li que R ando m c li que H ub c li que R ando m D eg r ee H ub MonteCarloRumorBetweennessClosenessDegreeEigenvectorRandomWalk D eg r ee c li que R ando m c li que H ub c li que H ub D eg r ee R ando m BetweennessRumorDegreeClosenessMonteCarloEigenvectorRandomWalk D eg r ee c li que R ando m c li que H ub c li que H ub R ando m D eg r ee
50 100
20 40 60
Figure 52: The same as Figure 39, but for small real-life networks instead of randomly generated networks.
St Lucia Facebook Copenhagen -2.2-4.76-2.24 -5.66-32.075.573.7652.6 024.86 -3.82 -2.1-2.241.315.528.93 2.437.55 -1.46-1.8428.05 -1.68-0.91 -3.055.5 -6.98-4.91-0.85.53 -2.5322.45 2.04 -6.021.956.995.22 -0.622.8113.17 -4.59-1.92 -2.25
RandomWalkDegreeBetweennessEigenvectorRumorMonteCarloCloseness R e m o v i ng m a x deg r ee R e m o v i ng r ando m R e m o v i ng m i n deg r ee A dd i ng r ando m A dd i ng m i n deg r ee A dd i ng m a x deg r ee -3.994.64 -4.43-15.56 1.997.376.58 -4.39-1.470.78 -0.491.311.37 -4.074.323.16 10.54 -5.965.122.46 -4.910.065.08 -2.26-0.82-2.420 -5.27-10.921.47 5.060.03 9.272.825.15 -4.061.43 -3.51-4.99-4.335.13 -4.56 MonteCarloBetweennessRumorClosenessDegreeEigenvectorRandomWalk R e m o v i ng m a x deg r ee R e m o v i ng m i n deg r ee R e m o v i ng r ando m A dd i ng m i n deg r ee A dd i ng r ando m A dd i ng m a x deg r ee -1.8-1.87-0.984.7 7.9410.53 -0.7617.33 4.12-3.84 -0.776.987.85 -1.37-1.4612.87-8.94 -1.294.46 -2.18 0.31-2.9834.04 -10.087.9612.74 -0.7610.71 -0.55-0.912.97 24.22 -0.1917.52 -1.3918.986.3 -1.44 -1.19-2.63-0.76-0.34 RandomWalkBetweennessDegreeEigenvectorClosenessRumorMonteCarlo R e m o v i ng m a x deg r ee R e m o v i ng r ando m R e m o v i ng m i n deg r ee A dd i ng m i n deg r ee A dd i ng r ando m A dd i ng m a x deg r ee -25 0 25 50 -10 0 10 20 -10 0 10 20 30 Figure 53: The same as Figure 42, but for small real-life networks instead of randomly generated networks.73 nutella PGP Prostitution D e g r ee Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng E i g e n vec t o r Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng C l o se n ess Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng R u m o r Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng M on t e C a r l o Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Bots added E v ade r ' s r an k i ng Degree Degree clique Hub Hub clique Random Random clique3 supporters
Figure 54: Same as Figure 46, but for large real-life networks instead of randomly generated networks.74 nutella PGP Prostitution D e g r ee Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng E i g e n vec t o r Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng C l o se n ess Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng R u m o r Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng M on t e C a r l o Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Edges changed E v ade r ' s r an k i ng Adding RemovingMax degree Min degree Random
Figure 55: Same as Figure 47, but for large real-life networks instead of randomly generated networks.75 nutella PGP Prostitution
DegreeRumorMonteCarloClosenessEigenvector D eg r ee c li que R ando m c li que H ub c li que H ub R ando m D eg r ee DegreeMonteCarloRumorClosenessEigenvector D eg r ee c li que R ando m c li que H ub c li que D eg r ee H ub R ando m DegreeRumorMonteCarloEigenvectorCloseness D eg r ee c li que R ando m c li que H ub c li que H ub D eg r ee R ando m
20 40 60
10 20 30 40 -20 0 20 40
Figure 56: Same as Figure 48, but for large real-life networks instead of randomly generated networks.
Gnutella PGP Prostitution -25.7621.1113.64 -18.5923.4571.07 0.45-4.9420.66 -11.2735.920.7167.08 -63.09-11.19-7.9872.19 -26.1767.12 67.15 -2.28 -45.980.2644.45 17.5 -11.2567.243.07 -40.970.02
EigenvectorMonteCarloRumorClosenessDegree R e m o v i ng m a x deg r ee R e m o v i ng r ando m R e m o v i ng m i n deg r ee A dd i ng m i n deg r ee A dd i ng r ando m A dd i ng m a x deg r ee EigenvectorRumorDegreeMonteCarloCloseness R e m o v i ng m a x deg r ee R e m o v i ng r ando m R e m o v i ng m i n deg r ee A dd i ng m i n deg r ee A dd i ng r ando m A dd i ng m a x deg r ee EigenvectorRumorDegreeMonteCarloCloseness R e m o v i ng m a x deg r ee R e m o v i ng r ando m R e m o v i ng m i n deg r ee A dd i ng m i n deg r ee A dd i ng r ando m A dd i ng m a x deg r ee -50 0 50 100 -25 0 25 50 75 -100 0 100 200-100 0 100 200