[PDF] Tamper-Evident Complex Genomic Networks

Abstract

Networks are important storage data structures now used to store personal information of individuals around the globe. With the advent of personal genome sequencing, networks are going to be used to store personal genomic sequencing of people. In contrast to social media networks, the importance of relationships in this genomic network is extremely significant. Losing connections between individuals thus implies losing relationship information (E.g. father or son etc.). There currently exists a considerably serious problem in the current approach to storing network data. Simply stated, network data is not tamper-evident. In other words, if some links or nodes were changed/removed/added by a malicious attacker, it would be impossible for the administrator to detect such changes. While, in the current age of social media networks, change in node characteristics and links can be bad in terms of relationships, in the case of networks for storing personal genomes, the results could be truly devastating. Here we present a scheme for building tamper-evident networks using a combination of Cryptographic and Ego-based Network analytic methods. Using actual published data-sets, we also demonstrate the utility and validity of the scheme besides demonstrating its working in various possible scenarios of usage. Results from the extensive experiments demonstrate the validity of the proposed approach.

Full PDF

TTamper-Evident Complex Genomic Networks

Komal Batool and Muaz A. Niazi National University of Science & Technology, Information Security Department, Islamabad, 44000, Pakistan COMSATS Institute of IT, Computer Science Department, Islamabad, 44000, Pakistan * [email protected] + these authors contributed equally to this work ABSTRACT

Networks are important storage data structures now used to store personal information of individuals around the globe. Withthe advent of personal genome sequencing, networks are going to be used to store personal genomic sequencing of people. Incontrast to social media networks, the importance of relationships in this genomic network is extremely signiﬁcant. Losingconnections between individuals thus implies losing relationship information (E.g. father or son etc.). There currently exists aconsiderably serious problem in the current approach to storing network data. Simply stated, network data is not tamper-evident.In other words, if some links or nodes were changed/removed/added by a malicious attacker, it would be impossible for theadministrator to detect such changes. While, in the current age of social media networks, change in node characteristicsand links can be bad in terms of relationships, in the case of networks for storing personal genomes, the results could betruly devastating. Here we present a scheme for building tamper-evident networks using a combination of Cryptographic andEgo-based Network analytic methods. Using actual published data-sets, we also demonstrate the utility and validity of thescheme besides demonstrating its working in various possible scenarios of usage. Results from the extensive experimentsdemonstrate the validity of the proposed approach.

Introduction

In less than a decade, the idea of networks has evolved from being considered as a purely theoretical concept from ComputerScience, to being used almost everywhere. Current networks examples include Online Social Networks (OSNs) such asFacebook, Twitter, Reddit, and Google+ among others. The core idea of a Social network (or simply network) is based onconnectivity between different entities in the same system . Networks can range from social to biological or computational innature . Various chaotic effects can be observed in networks . Often times, it is important to study such complex systems froma multidisciplinary perspective . Other examples of multidisciplinary approaches include the use of complex systems approachto evaluate emotions in Hollywood and Brain computer interfaces .The traditional methods of studying complex networks involve developing models and performing analysis, which needs tobe validated, as we have previously discussed in . The actual effects of tampering in such networks depends primarily on theinformation contained in the network. Even if the network were merely social network, then unexpected changes in informationcould result in anything ranging from discontent to severe relationship setbacks. This would be compounded in the cases ofindividuals connected across multiple networks such as the case of multiplex networks .If however, the data were something as important as personal genomic information of individuals, the implications wouldbe a lot more. This data could be linked with other patients who are genetically related to each other or else, even relativesunrelated by blood (such as spouses etc.). This is an important change because if there are no mechanism for detectingtampering in a social network, the data in such social network databases would be prone to hacking and tampering besides theneed for privacy . Effects on the virtual perceptions of individuals can also result in severe symptoms in the actual lives ofindividuals .In the past, various techniques and security mechanisms have been introduced for securing networks. There are a number ofways in which intrusion in networks are detectable based on the features of attacks , some of which includes detecting thepathway accessed by attacker, exploiting system vulnerabilities and targeting the data or data collection methods in networks .Previous literature has classiﬁed the attacks according to the intents of attackers as well as based on the level of theirknowledge and expertise in penetration in to the systems . Unfortunately, however social networks still lack mechanismswhich give the ability to administrators to verify if anything has been tampered till now. The absence of such complete solutionsimplies the vulnerability of such networks . This is especially difﬁcult in genomic social networks because of the large sizeof the data, often requiring novel techniques to handle it effectively .The goal of this paper is to present a comprehensive mechanism to ensure tamper-detection in any type of networks, with aspeciﬁc focus on genomic networks. We also demonstrate the validity of the algorithms by means of an application on the links a r X i v : . [ c s . S I] A ug etween social entities of the network. This allows for a focus on the network structure rather than on the attributes of nodes.The contributions of our work can be summarized as follows:We present a cognitive digital footprinting method allowing for a tamper-evident model for networks. To evaluate nodespositions according to their impact in a network, we use centralities of nodes. In networks, data stored is not encrypted andtherefore more prone to attacks. Attackers compromise a node in a network and due to inter-connectivity of nodes, the attackcan result in compromising more nodes in the same network. In some situations, attacker tampers e.g. names or informationassociated with the individuals or changes the ties/links between the nodes in the network. This thus requires the introductionof mechanisms to revise the structure of the social networks thereby making it more resistant to tampering. Additionally,the mechanism should also make it evident if such attacks occur. Our proposed method uses a combination of networkcentrality-based techniques and cryptographic techniques to ensure that the administrator is able to ﬁgure out if the network hasbeen tampered with. The structure of the rest of the paper is as follows: First we give background of related areas including anoverview of key network centralities. This is then followed by a review of attacks on social networks. Next, we present theresults and discussion starting with initialization of the original network and then two cases of modiﬁcations - one which wasvalid and the other being invalid i.e. a case of tampering by a malicious attacker. We then give details of the methods used inthe paper giving methodological pipelines and ﬂowcharts of proposed algorithms. Other details of the modeling and analysisare presented in the supplementary information. Background

Security experts can deploy variety of tools to monitor networks such as Secure Information and Event Management (SIEM) .Hackers can eavesdrop on network trafﬁc, and tamper with the integrity of information and processes occurring across thenetwork. While on its face, it can appear that changing the nodes and their information may not be easy but in reality, once theattacker has access to the network, the entire network will be compromised. Besides, such tampering is not easy to detect at all.The typical mechanism of security involves experts using private keys for encryption employed along with the use of digitalsignatures, strong authorization, tamper-resistant protocols across communication links. Still, these mechanisms are safe if andonly if the keys are not themselves compromised - the processes are not abused. Cryptographic operations in networks mightprovide a reasonable level of protection for some applications. However, these often involve application on low-level data andthus, can be quite heavy in terms of processing and consuming memory and other valuable resources in big data scenarios. Forprotecting networks containing sensitive and important information, however, higher levels of assurance are needed.

Centralities

Freeman notes that the calculation of centrality is a key area of research focus in the domain of social network analysis foran extended period of time . Most commonly used centrality measures include degree centrality, closeness centrality,betweenness centrality, eccentricity centrality and eigenvector centrality—with degree, closeness and betweenness measuresbeing proposed by Freeman and eigenvector centrality proposed by Bonacich . Centrality is considered important byresearchers because centralities formally indicate the value of nodes in the network topology. Central positions have, however,often been equated with opinion leadership or popularity . Often, researchers primarily use the degree measure of centrality,perhaps because it is the easiest in terms of explanation to non-technical audiences — besides its association with behavior isintuitive. In the current paper, we are looking to evaluate and validate the role of commonly-used centralities in the identiﬁcationof nodes which are actually inﬂuential in the network. We focus on the following centralities for the analysis:1. Degree Centrality: It is deﬁned as the number of links of node . Degree centrality of a node v is calculated as: C D ( v ) . = k v n − = ∑ j ∈ G a vj n − , where k v is the degree of a node, n is the total number of the nodes in the network.2. Betweenness Centrality: Betweenness centrality quantiﬁes “the number of times a node acts as a bridge along the shortestpath between two other nodes” . Betweenness centrality is calculated as follows: C B ( v ) = ∑ s (cid:54) = v (cid:54) = t σ st ( v ) σ st , where σ st istotal number of shortest paths from node s to node t and σ st ( v ) is the number of those paths that intersect node v.3. Closeness Centrality: deﬁnes as node closeness towards each node in a network . It is calculated using the formula: C C ( v ) = ∑ dist ( v , t ) , where v and t are the nodes from the vertices G.4. Eccentricity Centrality: The eccentricity centrality of a node is equal to “the largest geodesic distance between the nodeand any other node” . Generally, when the Eccentricity centrality is higher for a node, the rate of diffusion for the sameis lower. It is calculated as follows: C Ecc ( v ) = Max { ( dist ( v , t )) } , where v and t are the nodes from the vertices G.5. Eigenvector Centrality: It is deﬁned as a “Measure of the inﬂuence of a node in a network” . Eigenvector is deﬁned asfollows: λ v . = Av , where A is the adjacency matrix of the graph, λ is a constant (the eigenvalue), and v is the eigenvector. Short Review of Attacks on Social Networks

With the recent rise in incidents at the global levels, there has also been a corresponding considerable increase of interest inresearch on the spread and tracking of terrorism. Reid and Chen , present an intellectual structure of research conducted fromthe last two decades. The authors focus on terrorism outbreaks around the world. They present the structure of the research inthe domain. Visualization techniques have been used to map contemporary terrorism research domains which includes datamining, analysis, charting, and visualizing the terrorism research area according to experts, institutions, topics, publications,and social networks. Domain mapping is an important but difﬁcult task as it is not easily accessible but it is of clandestinenature. Neither the intellectual structure nor the characteristics are easily traceable. Domain mapping helps in investigatingtrends and validating perceptions for experts whereas for newbies this mapping provides new research areas as well as beinghelpful in research development for new areas. The investigations of this proposes that prior researches have a heavy inﬂuenceon new researches and also mostly cited work such as Rohan Gunaratna and many others are heavily inﬂuenced by the previousperformed research on terrorism.Cao highlights the importance of behavior analysis in securing systems. These systems could range from those employedfor business intelligence to social computing. This is besides usage for the analysis of intrusion detection in networks to eventsand even in making decisions. Additionally, these could be useful for business analysis as these methods can be used to analyzemarkets, users preferences and for the detection of exceptional behavior of terrorists and criminals in networks. Traditional waysare used to detect such behaviors but they are not well organized nor the transactional data covers all aspects of representationsof human behaviors. The authors have introduced advanced ways of behavior analysis due to inefﬁciency of traditional analysis.Their methods involve the use of important information such as links between entities which helps in extracting the hiddenelements in transactional data. The goal of behavior analysis us to help develop methodologies, techniques, and practical toolsfor representing, modeling, analyzing and understanding of networks for detecting anomalies. In behavioral network, intrinsicmechanisms change a network from inside which effect in network topological change. One of the major research issues in thisdomain is that behavioral elements are often intersprersed in transactional data and it can be considerably difﬁcult to gather andanalyze them in their entirety.Zhu et al. note that social network concepts cannot be estimated easily but they can only be visualized through computersimulations. Although there are various tools and techniques available for visualization but they are often unable to performcomplete visualization of social network concepts. This paper proposes a new concept based on visualization which thempresented in the form of a , ”NetVizer” which gives a better visualization of betweenness centrality concepts . Social networkanalysis is often employed by organizations for data mining, and also for understanding decision-making processes. The idea isto maximize information ﬂow in employee social networks. Social network analysis is being used in various ﬁelds such as lawﬁrms. medicine companies, and ﬁnancial institutions besides being used in research and development organizations. The usesof social network analysis based decision making can range from expert assessment, criminal investigation, and communityunderstanding. The analysis is typically focused on the network information.Besides NetVizer, there are other tools such as proposed by Chung et al. in . The paper presents a crime analysis toolfor dynamic visualization of events. This tools has been proposed as an improvement on previous visualization tools whichwere manual and less efﬁcient. This tools ﬁnds spatial temporal patterns of crimes and visualizes them. Previously, networkcharts were used for crime analysis and were drawn manually where as other software applications were too difﬁcult to use andinterpret. Therefore, there is a need to develop automatic crime detection and analysis tools which are easy to use and interpretdata easily.Other than completely relying on tools, there has to be ways which help in detecting networks loopholes. These includework by Van der Aalst and Medeiros which suggests the frequent checking of audit trails in any organization to help detectsecurity breaches and anomalies. Audit trails are used for analyzing security violations in systems where processes log theirevents through time stamps indicating the causality of events by stamping the time of occurrence. The paper presents analpha-algorithm which can be used to support security at various levels of a systems such as from process execution to checkingconformance. This algorithm uses process mining techniques for storing and monitoring audit trails in organizations.Chau and Xu present a semi automated approach to identify certain groups by studying their important structuralcharacteristics. People with certain opinions and emotions are studied such as those forming hate groups. Over the internet,there are various social networks which can be used to propagate opinions, emotions, and beliefs thereby inﬂuencing otherpeople. These methods have traditionally gained considerable success. A number of techniques were used to study suchnetworks including web-mining and social network analysis to study crimes over the internet such as in the formation ofextremists groups and other terrorist organizations . Social network analysis has this clearly been extremely helpful inexploring networks and their characteristics, organizational and inter-organizational behavior and in many other domains. Butalso helps in identifying central nodes based on their functioning roles in networks.Beside tools and algorithms, models such as those based on clustering can also be used in tracing crime patterns.Ahmad et al. identify gold farmers in gaming social networks. These are involved in the illegal practice of buying and elling of virtual goods in online games for real money. This employs mining techniques for the detection of such group innetworks.Ball notes that Social Network analysis can be a very effective automated tool in counter-terrorism research . Kukkalaet al. demonstrate privacy-preserving social network analysis in distributed social networks . Ongkowijoyo and Doloi haveemployed social network analysis to understand risks to the infrastructure . Colladon and Remondi have used Social NetworkAnalysis to detect and prevent money laundering . Results & Discussion

In this section, we present proof of concept for the proposed model performed on an empirical data set of Zachary Karate ClubNetwork . This is followed by a detailed discussion and analysis of various possible scenarios resulting from tampering in thelight of the proposed tamper-evident algorithm. Data sets

The empirical data set has been collected form a real world network published in . This data set contains 34 individualsbonded with each other for forming a social network responsible for diffusing information in the network. The published dataset has been used for experimentation so as to provide a proof of concept to the proposed model and algorithms.We employ a combination of centralities on the collected network. It is because the centrality measures are used to identifythe critical positions of nodes in a network and centralities give mathematical value to these positions. If a position of any nodechanges in the network either it is the addition or deletion of a vertices or edge, so does the centrality value. We take Hashvalues of the centrality measures to maintain the integrity of the measured values. Building A Tampering Evident Model

Scenario I: Original Network

We test the original network without altering the integrity of any node and compare the hash value calculated before and after.Figure 1 shows the procedure followed to authenticate and protect the integrity of the network. This shows data is used toform a network which is going to be a baseline for rest of the scenarios after which centralities are calculated, C ( n ) . Then, weuse merging algorithm to merge all centralities as even a slightest change in network will effect the centralities values. Forpreserving the integrity of the network, we calculate hashed values, h c of the merged centralities, M c . Figure 1.

Methodology Framework for Original Network.

Scenario II: Valid Modiﬁcation Network

We assume here that ties/links between the nodes can be changed over time. This change is authorized and needs to be handledcarefully as to differentiate the changes done by attackers. Therefore all the authorized changes are updated regularly by usingan update algorithm at regular intervals.Figure 2 shows the procedure followed to authenticate and protect the integrity of the network. The network ( NW ) is usedon which authorized changes are performed to form a new network ( NW ) after which centralities are calculated, C(n). Then,we use merging algorithm to merge all centralities of the new network. For protecting the integrity of the network, we calculatehashed values, h c of the merged centralities, M c .We use update algorithm to store new calculated values of the network ( NW ) .We added and deleted links naturally and updated the centrality values simultaneously and veriﬁed the hash values to see if thesuggested model can work for dynamic networks. Scenario III: Tampered Network

For this scenario, we have two situations. One is attacker compromises a single node in the network where as second scenariois to avoid detection at any point, he deletes the whole network. For this scenario, we have two situations. One is attacker igure 2.

Methodology Framework to Valid Modiﬁed Network.compromises a single node in the network where as second scenario is to avoid detection at any point, he deletes the wholenetwork.In ﬁgure 3 shows formal procedure followed to authenticate and protect the integrity of the network. The network ( NW ) isused on which attacker changes are performed which forms a new network ( NW ) after which centralities are calculated, C ( n ) for the attacked network. Textual merge algorithm is used to merge all centralities of the attacked network. For evaluating theintegrity of the network, we calculate hash values, h cT of the textual merged centralities of attacked network, M cT . We usecomparison algorithm to compare the new calculated hash values of attacked network ( NW ) with the stored values of originalnetwork i.e h c (cid:54) = h cT .Our experiments demonstrate that each of the centrality measures has a unique effect nodes of networks. Therefore, ourproposed model can easily detect minor changes in the network structure. Features of Tamper-evident Model

We focus to propose tamper-evident model driven by three main objectives:1. Tamper detection: Any unauthorized changes will be detected and will also indicate the source of change occurred.2. Independence: The proposed model does not require any cooperation from monitored systems and does not depend onother components installed for monitoring network activities.3. Lightweight veriﬁcation: The proposed model is efﬁcient for monitoring network and and can be easily integrated intosystems and other system applications.

Theorem 1

Tamper-evident networks algorithms can be implemented with a time complexity of O ( n ) .Our proposed model satisﬁes the mentioned goals. For this, we have applied the proposed mechanism on empirical networks.Details are given in the section on ”Methods”. Methods

In this paper, we propose cognitive digital foot-printing in social network in building a tamper-evident model. For this we haveapplied the basic digital foot printing concept in which any unauthorized tampering done to any node will be easily detectable.In this section we present the tamper-evident storage mechanism in the form of several algorithms. igure 3.

Methodology Framework to Tampered Network. ain Algorithm

Following presents the Main algorithm which deﬁnes the overall working of the proposed Tamper-evident model. This involvesfurther algorithms explained later.The proposed algorithm functionality can be described as following:1. Generate the network from collected data for network formation.2. We run Node-Safe Hash (NSH) algorithm, discussed later.3. Next, system will check if the changes are done by an authenticated/valid user or not.4. If the changes are from a valid user, system will execute Tamper-check (TC) algorithm, discussed later.5. TC algorithm checks for tampering in the network. If the network has been tampered, this will generate an alarm andnotify the admin to take an appropriate action. Else, this updates the new hash values to old hash values and the loopcontinues.

Figure 4.

Main Algorithm. ode-Safe Hash (NSH) Algorithm

Node-safe hash algorithm is used to calculate hash value of nodes and saving it appropriately. This follows as:1. Take the complete data of nodes collected for network formation.2. Execute the Calculate Centralities (CC) algorithm, discussed later.3. Take all nodes and save their calculated hash values.4. Check if it is not a last node then take the next to save its associated hash, else, stop.

Figure 5.

Node-Safe Hash (NSH) Algorithm. alculate Centralities (CC) Algorithm

Calculate Centralities algorithm calculates all centralities hash values by the following proposed procedure:1. Calculate all the centralities which includes Degree Centrality C D , Betweenness Centrality C B , Closeness Centrality C C ,Eccentricity Centrality C Ecc , Eigenvector Centrality C Ei .2. We perform textual merge of centralities to combine the centralities.3. Then, apply SHA1 algorithm to the calculated centralities for keeping the integrity-check of merged centralities. Figure 6.

Calculate Centrality (CC) Algorithm. amper-Check (TC) Algorithm

Following describes the working of Tamper-check algorithm. This checks if the network has been tampered or not. This worksas following:1. We run NSH algorithm to calculate and save the hash values of network nodes.2. Check if the calculated hash value is equal to the previously calculated hash values.3. If the new calculated hash values are not equal to old hash values then network is being tampered else, it is not.

Figure 7.

Tamper-check (TC) Algorithm. alculate SHA1 Algorithm

SHA1 is used for preserving the integrity of data contents . We use hashing algorithms to authenticate if the contents havebeen changed or modiﬁed to detect masqueraders who insert message from fraudulent sources. This also detects if the contentis modiﬁed by insertion, deletion and reordering sequence and also notiﬁes is timing modiﬁcation is done which is used forreplaying valid sessions. This algorithm follows as in and has been implemented in our proposed solution as:1. Take the textual merge of centralities as input.2. Add padding bits to input value to make it congruent to 448 mod 512. That is adding one 1 and as many 0’s to make itcongruent to 448 mod 512.3. Append a 64 bits length to the the padded textual merged centralities. These bits hold the binary format of 64 bitsindicating the length of the original message.4. Prepare Processing functions - This requires 80 processing functions deﬁned as following : f ( t ; B , C , D ) = ( BANDC ) OR (( NOT B ) ANDD )( < = t < = ) f ( t ; B , C , D ) = BXORCXORD ( < = t < = ) f ( t ; B , C , D ) =( BANDC ) OR ( BANDD ) OR ( CANDD )( < = t < = ) f ( t ; B , C , D ) = BXORCXORD ( < = t < = )

5. Prepare Processing Constants - 80 processing constants are required to produce 5 words, deﬁned as following : K ( t ) = x A ( < = t < = ) K ( t ) = x ED EBA ( < = t < = ) K ( t ) = x F BBCDC ( < = t < = ) K ( t ) = xCA C D ( < = t < = )

6. Initialize Message Digest Buffers - In this integrity check algorithm, SHA1 requires 160 bits or 5 buffers of words (32bits) : H = x H = xEFCDAB H = x BADCFE H = x H = xC D E F

07. Processing message in 512-bit blocks (L blocks in total message) - This is the core functioning of SHA1 algorithmwhich loops through the padded and appended message in 512-bit blocks. Input and predeﬁned functions: M [ , , ..., L ] :Blocks of the padded and appended message f ( B , C , D ) , f ( , B , C , D ) , ..., f ( , B , C , D ) : 80 Processing Functions K ( ) , K ( ) , ..., K ( ) : 80 Processing Constant Words H , H , H , H , H , H

5: 5 Word buffers with initial values.SHA1 works as:For loop on k = toL ( W ( ) , W ( ) , ..., W ( )) = M [ k ] / ∗ DivideM [ k ] into words ∗ / For t = 16 to 79 do: W ( t ) = ( W ( t − ) XORW ( t − ) XORW ( t − ) XORW ( t − )) <<< A = H , B = H , C = H , D = H , E = H T EMP = A <<< + f ( t ; B , C , D ) + E + W ( t ) + K ( t ) E = D , D = C , C = B <<< , B = A , A = T EMP

End of for loop H = H + A , H = H + B , H = H + C , H = H + D , H = H + E End of for loopOutput: H , H , H , H , H , H

5: Word buffers with ﬁnal message digest.

Figure 8.

Secure Hash (SHA1) Algorithm. onclusion

In this paper, we have presented a tamper evident mechanism for complex networks, in general, and complex genomic networks,in particular. We have demonstrated the model using a proof-of-concept validation scheme using actual data sets. Our proposedmodel detects unauthorized changes in a given network. Experiments have also been presented which were carried out on areal world network to allow for an examination of how network changes can be easily detected using the proposed approach.We have used centralities which are more commonly used for detecting critical positions of nodes in complex networks. Theexperiments clearly demonstrate that the model can be easily implemented in complex networks for detecting any changesin the network. We have focused here primarily on the network structure and not on the node attributes or node information.Our experiments results show that even a minor change in network structure can be easily detectable through this model.A limitation of the proposed model is that currently it works only on the network structure and not on the node attributesor information. In the future, the proposed technique can be extended to include node-based information. Cryptographictechniques can be further included in the proposed mechanisms.

References Kossinets, G. & Watts, D. J. Empirical analysis of an evolving social network.

Science , 88–90 (2006). Altamimi, A. B. & Ramadan, R. A. Towards internet of things modeling: a gateway approach.

Complex Adaptive SystemsModeling , 25 (2016). Malek, J. & Azar, A. T. A computational ﬂow model of oxygen transport in the retinal network.

International Journal ofModelling, Identiﬁcation and Control , 361–371 (2016). Perc, M. Chaos promotes cooperation in the spatial prisoner’s dilemma game.

EPL (Europhysics Letters) , 841 (2006). Trenchard, H. & Perc, M. Equivalences in biological and economical systems: Peloton dynamics and the rebound effect.

PloS one , e0155395 (2016). Cipresso, P. & Riva, G. Computational psychometrics meets hollywood: The complexity in emotional storytelling.

Frontiers in Psychology (2016). Ramadan, R. A. & Vasilakos, A. V. Brain computer interface: control signals review.

Neurocomputing , 26–44 (2017). Batool, K. & Niazi, M. A. Towards a methodology for validation of centrality measures in complex networks.

PloS one ,e90283 (2014). Mucha, P. J., Richardson, T., Macon, K., Porter, M. A. & Onnela, J.-P. Community structure in time-dependent, multiscale,and multiplex networks. science , 876–878 (2010).

Gross, R. & Acquisti, A. Information revelation and privacy in online social networks. 71–80 (ACM, 2005).

Cipresso, P. Modeling behavior dynamics using computational psychometrics within virtual worlds.

Frontiers in psychology , 1725 (2015). DeMara, R. F. & Rocke, A. J. Mitigation of network tampering using dynamic dispatch of mobile agents.

Computers &Security , 31–42 (2004). Howard, J. D. & Longstaff, T. A. A common language for computer security incidents.

Sandia National Laboratories (1998).

Bott, T. Evaluating the risk of industrial espionage. 230–237 (IEEE, 1999).

Anderson, R. & Kuhn, M. Tamper resistance-a cautionary note. vol. 2, 1–11 (1996).

Leskovec, J. et al.

Cost-effective outbreak detection in networks. 420–429 (ACM, 2007).

Memon, N. & Larsen, H. L. Investigative data mining toolkit: a software prototype for visualizing, analyzing anddestabilizing terrorist networks. Tech. Rep., DTIC Document (2006).

Nath, S. V. Crime pattern detection using data mining. 41–44 (IEEE, 2006).

O’Donnell, A. J., Mankowski, W. C. & Abrahamson, J. Using e-mail social network analysis for detecting unauthorizedaccounts. (2006).

Oatley, G., Ewart, B. & Zeleznikow, J. Decision support systems for police: Lessons from the application of data miningtechniques to ”soft” forensic evidence.

Artiﬁcial Intelligence and Law , 35–100 (2006). Sommer, R. & Paxson, V. Outside the closed world: On using machine learning for network intrusion detection. 305–316(IEEE, 2010). van der Aalst, W. M. & de Medeiros, A. K. A. Process mining and security: Detecting anomalous process executions andchecking process conformance. Electronic Notes in Theoretical Computer Science , 3–21 (2005).

Wang, W., Man, H. & Liu, Y. A framework for intrusion detection systems by social network analysis methods in ad hocnetworks.

Security and Communication Networks , 669–685 (2009). Yardi, S., Romero, D., Schoenebeck, G. et al.

Detecting spam in a twitter network.

First Monday (2009). Azar, A. T. & Hassanien, A. E. Dimensionality reduction of medical big data using neural-fuzzy classiﬁer.

Soft computing , 1115–1127 (2015). Kotenko, I. & Chechulin, A. Attack modeling and security evaluation in siem systems.

International Transactions onSystems Science and Applications , 129–147 (2012). Howell, D. Building better data protection with siem.

Computer Fraud & Security , 19–20 (2015).

Freeman, L. C. Centrality in social networks conceptual clariﬁcation.

Social networks , 215–239 (1979). Kimura, M., Saito, K., Nakano, R. & Motoda, H. Finding inﬂuential nodes in a social network from information diffusiondata. 1–8 (Springer, 2009).

Bouttier, J., Di Francesco, P. & Guitter, E. Geodesic distance in planar graphs.

Nuclear Physics B , 535–567 (2003).

Bonacich, P. Factoring and weighting approaches to status scores and clique identiﬁcation.

Journal of MathematicalSociology , 113–120 (1972). Becker, M. H. Sociometric location and innovativeness: Reformulation and extension of the diffusion model.

AmericanSociological Review

Rogers, E. M.

Diffusion of Innovations (Free Press, 2003), 5 th edn. Valente, T. W. Network models of the diffusion of innovations.

Computational & Mathematical Organization Theory ,163–164 (1996). Valente, T. W. & Davis, R. L. Accelerating the diffusion of innovations using opinion leaders.

Annals of the AmericanAcademy of Political and Social Science , 55–67 (1999).

Askari-Sichani, O. & Jalili, M. Large-scale global optimization through consensus of opinions over complex networks.

Complex Adaptive Systems Modeling , 11 (2013). Sabidussi, G. The centrality index of a graph.

Psychometrika , 581–603 (1966). Reid, E. F. & Chen, H. Mapping the contemporary terrorism research domain.

International Journal of Human-ComputerStudies , 42–56 (2007). Cao, L. In-depth behavior understanding and use: the behavior informatics approach.

Information Sciences , 3067–3085(2010).

Zhu, B., Watts, S. & Chen, H. Visualizing social network concepts.

Decision Support Systems , 151–161 (2010). Chung, W., Chen, H., Chaboya, L. G., O’Toole, C. D. & Atabakhsh, H. Evaluating event visualization: a usability study ofcoplink spatio-temporal visualizer.

International Journal of Human-Computer Studies , 127–157 (2005). Chau, M. & Xu, J. Mining communities and their relationships in blogs: A study of online hate groups.

InternationalJournal of Human-Computer Studies , 57–70 (2007). Ahmad, M. A., Keegan, B., Srivastava, J., Williams, D. & Contractor, N. Mining for gold farmers: Automatic detection ofdeviant players in mmogs. vol. 4, 340–345 (IEEE, 2009).

Ball, L. Automating social network analysis: A power tool for counter-terrorism.

Security Journal , 147–168 (2016). Kukkala, V. B., Saini, J. S. & Iyengar, S. Privacy preserving network analysis of distributed social networks. In

InformationSystems Security , 336–355 (Springer, 2016).

Ongkowijoyo, C. & Doloi, H. Determining critical infrastructure risks using social network analysis.

International Journalof Disaster Resilience in the Built Environment (2017). Colladon, A. F. & Remondi, E. Using social network analysis to prevent money laundering.

Expert Systems withApplications , 49–58 (2017). Zachary, W. W. An information ﬂow model for conﬂict and ﬁssion in small groups.

Journal of anthropological research

Jones, P. Us secure hash algorithm 1 (sha1) (2001). uthor contributions statement

M.N. conceived the experiments and analyzed the results, K.B. conducted the experiments and analyzed the results. All authorsreviewed the manuscript.

Competing interests

The authors declare that they have no competing ﬁnancial interests.

Correspondence