[PDF] No Place to Hide: Catching Fraudulent Entities in Tensors

Abstract

Many approaches focus on detecting dense blocks in the tensor of multimodal data to prevent fraudulent entities (e.g., accounts, links) from retweet boosting, hashtag hijacking, link advertising, etc. However, no existing method is effective to find the dense block if it only possesses high density on a subset of all dimensions in tensors. In this paper, we novelly identify dense-block detection with dense-subgraph mining, by modeling a tensor into a weighted graph without any density information lost. Based on the weighted graph, which we call information sharing graph (ISG), we propose an algorithm for finding multiple densest subgraphs, D-Spot, that is faster (up to 11x faster than the state-of-the-art algorithm) and can be computed in parallel. In an N-dimensional tensor, the entity group found by the ISG+D-Spot is at least 1/2 of the optimum with respect to density, compared with the 1/N guarantee ensured by competing methods. We use nine datasets to demonstrate that ISG+D-Spot becomes new state-of-the-art dense-block detection method in terms of accuracy specifically for fraud detection.

Full PDF

NNo Place to Hide: Catching Fraudulent Entities in Tensors

Yikun Ban ∗ Peking [email protected]

Xin Liu

Tsinghua [email protected]

Yitao Duan

[email protected]

Xue Liu

McGill [email protected]

Wei Xu

Tsinghua [email protected]

ABSTRACT

Many approaches focus on detecting dense blocks in the tensorof multimodal data to prevent fraudulent entities (e.g., accounts,links) from retweet boosting, hashtag hijacking, link advertising,etc. However, no existing method is effective to find the dense blockif it only possesses high density on a subset of all dimensions intensors. In this paper, we novelly identify dense-block detectionwith dense-subgraph mining, by modeling a tensor into a weightedgraph without any density information lost. Based on the weightedgraph, which we call information sharing graph (ISG), we proposean algorithm for finding multiple densest subgraphs, D-Spot, thatis faster (up to 11x faster than the state-of-the-art algorithm) andcan be computed in parallel. In an N-dimensional tensor, the entitygroup found by the ISG+D-Spot is at least 1/2 of the optimumwith respect to density, compared with the 1/N guarantee ensuredby competing methods. We use nine datasets to demonstrate thatISG+D-Spot becomes new state-of-the-art dense-block detectionmethod in terms of accuracy specifically for fraud detection.

CCS CONCEPTS • Information systems → Wrappers (data mining) . KEYWORDS

Dense-block Detection; Graph Algorithms; Fraud Detection

ACM Reference Format:

Yikun Ban, Xin Liu, Yitao Duan, Xue Liu, and Wei Xu. 2019. No Place to Hide:Catching Fraudulent Entities in Tensors. In

Proceedings of the 2019 WorldWide Web Conference (WWW ’19), May 13–17, 2019, San Francisco, CA, USA.

ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3308558.3313403

Fraud represents a serious threat to the integrity of social or reviewnetworks such as Twitter and Amazon, with people introducingfraudulent entities (e.g., fake accounts, reviews, etc.) to gain morepublicity/profit over a brief period. For example, on a social net-work or media sharing website, people may wish to enhance theiraccount’s popularity by illegally buying more followers [27]; on ∗ Yikun did the project during his visit at Tsinghua University.This paper is published under the Creative Commons Attribution 4.0 International(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on theirpersonal and corporate Web sites with the appropriate attribution.

WWW ’19, May 13–17, 2019, San Francisco, CA, USA © 2019 IW3C2 (International World Wide Web Conference Committee), publishedunder Creative Commons CC-BY 4.0 License.ACM ISBN 978-1-4503-6674-8/19/05.https://doi.org/10.1145/3308558.3313403 e-commerce websites, fraudsters may register multiple accounts tobenefit from “new user” promotions.Consider the typical log data generated from a social review site(e.g., Amazon), which contains four-dimensional features: users,products, timestamps, rating scores. These data are often formulatedas a tensor, in which each dimension denotes a separate feature andan entry (tuple) of the tensor represents a review action. Based onprevious studies [12, 30], fraudulent entities form dense blocks (sub-tensors) within the main tensor, such as when a mass of fraudulentuser accounts create an enormous number of fake reviews for aset of products over a short period. Dense-block detection hasalso been applied to network intrusion detection [20, 30], retweetboosting detection [12], bot activities detection [30], and geneticsapplications [20, 26].Various dense-block detection methods have been developed.One approach uses tensor decomposition, such as CP decompositionand higher-order singular value decomposition [20]. However, asobserved in [32], such methods are outperformed by search-basedtechniques [12, 30, 32] in terms of accuracy, speed, and flexibility re-garding support for different density metrics. Furthermore, [30, 32]provide an approximation guarantee for finding the densest/optimalblock in a tensor.We have examined the limitations of search-based methods fordense-block detection. First, these methods are incapable of detect-ing hidden-densest blocks . We define a hidden-densest block as onethat does not have a high-density signal on all dimensions of a ten-sor, but evidently has a high density on a subset of all dimensions.Moreover, existing methods neglect the data type and distributionof each dimension on the tensor. Assuming that two dense blocks Aand B have the same density, however, A is the densest on a subsetof critical features, such as IP address and device ID, whereas B isthe densest on some trivial features such as age and gender. Canwe simply believe that A is as suspicious as B? Unfortunately, theanswer when using existing methods is ‘yes.’To address these limitations, we propose a dense-block detectionframework and focus on entities that form dense blocks on tensors.The proposed framework is designed using a novel approach. Givena tensor, the formation of dense blocks is the result of value sharing(the behavior whereby two or more different entries share a distinctvalue (entity) in the tensor). Based on this key point, we propose anovel

Information Sharing Graph (ISG) model, which accurately cap-tures each instance of value sharing. The transformation from denseblocks in a tensor to dense subgraphs in ISG leads us to propose afast, high-accuracy algorithm, D-Spot, for determining fraudulententities with a provable guarantee regarding the densities of thedetected subgraphs. a r X i v : . [ c s . D S ] F e b n summary, the main contributions of this study are as follows:1) [Graph Model]. We propose the novel ISG model, whichconverts every value sharing in a tensor to the representation ofweighted edges or nodes (entities). Furthermore, our graph modelconsiders diverse data types and their corresponding distributionsbased on information theory to automatically prioritize multiplefeatures.2) [Algorithm].

We propose the

D-Spot algorithm, which is ableto find multiple densest subgraphs in one run. And we theoreticallyprove that the multiple subgraphs found by D-Spot must containsome subgraphs that are at least 1/2 as dense as the optimum. In real-world graphs, D-Spot is up to 11 × faster than the state-of-the-artcompeting algorithm.3) [Effectiveness]. In addition to dense blocks, ISG+D-Spot alsoeffectively differentiates hidden-densest blocks from normal ones.In experiments using eight public real-world datasets, ISG+D-Spotdetected fraudulent entities more accurately than conventionalmethods.

As most fraudulent schemes are designed for financial gain, it isessential to understand the economics behind the fraud. Only whenthe benefits to a fraudster outweigh their costs will they perform ascam.To maximize profits, fraudsters have to share/multiplex differentresources (e.g., fake accounts, IP addresses, and device IDs) overmultiple frauds. For example, [13] found that many users are associ-ated with a particular group of followers on Twitter; [36] identifiedthat many cases of phone number reuse; [4] observed that the IPaddresses of many spam proxies and scam hosts fall into a fewuniform ranges; and [38] revealed that fake accounts often conductfraudulent activities over a short time period.Thus, fraudulent activities often form dense blocks in a tensor(as described below) because of this resource sharing.

Search-based dense-block detection in tensors.

Previous stud-ies [12, 20, 30] have shown the benefit of incorporating featuressuch as timestamps and IP addresses, which are often formulatedas a multi-dimensional tensor. Mining dense blocks with the aim ofmaximizing a density metric on tensors is a successful approach.CrossSpot [12] randomly chooses a seed block and then greedilyadjusts it in each dimension until the local optimum is attained.This technique usually requires enormous seed blocks and doesnot provide any approximation guarantee for finding the globaloptimum. In contrast to adding feature values to seed blocks, M-Zoom [30] removes feature values from the initial tensor one by oneusing a similar greedy strategy, providing a 1 / N -approximationguarantee for finding the optimum (where N is the number of di-mensions in the tensor). M-Biz [31] also starts from a seed blockand then greedily adds or removes feature values until the blockreaches a local optimum. Unlike M-Zoom, D-Cube [32] deletes a setof feature values on each step to reduce the number of iterations,and is implemented in a distributed disk-based manner. D-Cubeprovides the same approximation guarantee as M-Zoom. Table 1: ISG+D-Spot vs. existing dense-block detection meth-ods. M A F [ ] C r o ss S p o t [ ] M - z oo m [ ] M - b i z [ ] D - c u b e [ ] I S G + D - S p o t Applicable to N-dimensional data? √ √ √ √ √ √

Catch densest blocks? √ √ √ √ √ √

Catch hidden-densest blocks? × × × × × √ %-Approximation Guarantee? × × × Tensor decomposition methods.

Tensor decomposition [17] isoften applied to detect dense blocks within tensors [20]. Scalablealgorithms, such as those described in [23, 33, 37], have been devel-oped for tensor decomposition. However, as observed in [12, 32],these methods are limited regarding the detection of dense blocks,and usually detect blocks with significantly lower densities, provideless flexibility with regard to the choice of density metric, and donot provide any approximation guarantee.

Dense-subgraph detection.

A graph can be represented by a two-dimensional tensor, where an edge corresponds to a non-zero entryin the tensor. The mining of dense subgraphs has been extensivelystudied [18]. Detecting the densest subgraph is often formulated asfinding the subgraph with the maximum average degree, and mayuse exact algorithms [10, 16] or approximate algorithms [6, 16].Fraudar [11] is an extended approximate algorithm that can beapplied to fraud detection in social or review graphs. CoreScope [29]tends to find dense subgraphs in which all nodes have a degree of atleast k . Implicitly, singular value decomposition (SVD) also focuseson dense regions in matrixes. EigenSpoke [24] reads scatter plots ofpairs of singular vectors to find patterns and chip communities, [7]extracts dense subgraphs using a spectral cluster framework, and[14, 27] use the top eigenvectors from SVD to identify abnormalusers. Other anomaly/fraud detection methods

The use of belief prop-agation [3, 22] and HITS-like ideas [8, 9, 13] is intended to catch rarebehavior patterns in graphs. Belief propagation has been used toassign labels to the nodes in a network representation of a Markovrandom field [3]. When adequate labeled data are available, classi-fiers can be constructed based on multi-kernel learning [2], supportvector machines [35], and k -nearest neighbor [34] approaches. In this section, we introduce the notations and definitions usedthroughout the paper, analyze the limitations of existing approaches,and describe our key motivations.

Table 2 lists the notations used in this paper. We use [ N ] = { , ..., N } for brevity. Let R ( A , ..., A N , X ) = { t , ..., t | X | } be a relation with N -dimensional features, denoted by { A , ..., A N } , and a dimensionalentry identifiers, denoted by X . For each entry (tuple) t ∈ R , t = ( a , ..., a N , x ) , where ∀ n ∈ [ N ] , we use t [ A n ] to denote the valueof A n in t , t [ A n ] = a n and t [ X ] to denote the identification of t , t [ X ] = x , x ∈ X . We define the mass of R as | R | , which is the totalnumber of such entries, | R | = | X | . For each n ∈ [ N ] , we use R n to able 2: Symbols and Definitions Symbol Interpretation N number of dimensions in a tensor [ N ] set {1,..., N} R ( A , ..., A N , X ) relation representing a tensor A n n -th dimensional values of RR n set of distinct values of A n of R t = ( a , ..., a N , x ) an entry (tuple) of R B(A , ..., A N , X) a block in R B n set of distinct values of A n of B U target dimension in RV = { u , ..., u } set of distinct values of U G = ( V , E ) Information Sharing Graph S i , j S -score between u i and u j S i S -score of u i G = (V , E) subgraph in G F density metric denote the set of distinct values of A n . Thus, R naturally representsan N -dimensional tensor of size | R | × ... × | R N | .A block B in R is defined as B(A , ..., A N , X) = { t ∈ R : t [ X ] ∈ X} and X ⊆ X . Additionally, the mass |B| is the numberof entries of B and B n is the set of distinct values of A n . Let B( a , A n ) = { t ∈ R : t [ A n ] = a } represent all entries that take thevalue a on A n . The mass |B( a , A n )| is the number of such entries.A simple example is given as follows. Example 1 (Amazon review logs).

Assume a relation R ( user , product , timestamp , X ) , where ∀ t ∈ R , t = ( a , a , a , x ) indicates a reviewaction where user a reviews product a at timestamp a , and theidentification of the action is x . Because a may review a at a (we assume that a represents a period) multiple times, X helps usdistinguish each such action. The mass of R , denoted by | R | , is thenumber of all review actions in the dataset. The number of distinctusers in R is | R | . A block B( a , user ) is the set of all rating entriesoperated by user a , and the number of overall entries of B( a , user ) is |B( a , user )| .First, we present a density metric that is known to be useful forfraud detection [30, 32]: Definition 1. (Arithmetic Average Mass ρ ). Given a block B(A , ..., A N , X) , the arithmetic average mass of B on dimensions N is ρ (B , N) = |B| |N | (cid:205) n ∈N |B n | , where N is a subset of [ N ] and obviously ρ ∈ [ . , + ∞) . If block B is dense in R , then ρ (B , [ N ]) > . . Other density metrics listed in [32] are also effective for frauddetection. It is broadly true that all density measures are functionsof the cardinalities of the dimensions and masses of B and R . In R ,previous studies [12, 30–32] have focused on detecting the top- k densest blocks in terms of a density metric. In the remainder of thispaper, we use the density metric ρ to illustrate our key points. In practice, the blocks formed by fraudulent entities in R may bedescribed by hidden-densest blocks . To illustrate hidden-densestblocks, we present the following definitions and examples. user 1 product 1 date 1 IP 1 … user 2 product 1 date 1 IP 2 … Dim 1 Dim 2 Dim 3 Dim 4Entry 1Entry 2Entry 3 (a) Relation R …… Optimal

D-Spot D-SpotD-Spot(b) ISG (c) Fraudulent Entities

Target Dimension

Figure 1: Workflow of ISG+D-Spot.

Definition 2. In R ( A , ..., A N , X ) , we say that B(A , ..., A N , X) is the densest on a dimension A n if ρ (B , { n }) is the maximal valueof all possible ρ ( ˆ B , { n }) , where ˆ B is any possible block in R . Definition 3. (Hidden-Densest Block). In R ( A , ..., A N , X ) , B(A , ..., A N , X) is the hidden-densest block if B is the densest on a smallsubset of { A , ..., A N } . Example 2 (Registration logs).

In a registration dataset with 19features, fake accounts only exhibit conspicuous resource sharing withrespect to the IP address feature.

Example 3 (TCP dumps).

The DARPA dataset [1] has 43 features,but the block formed by malicious connections is only the densest ontwo features.

Thus, catching hidden-densest blocks has a significant utilityin the real world. Unfortunately, the problem is intractable usingexisting approaches [12, 30–32].First, assuming that the hidden-densest block

B(A , ..., A N , X) is only the densest on dimension A N , we have that ρ (B , [ N − ]) = |B| N − (cid:205) n ∈[ N − ] |B n | ≈ ρ (B , [ N ]) = |B| N (cid:205) n ∈[ N ] |B n | when N is sufficiently large. Assuming ρ (B , [ N − ]) is very low,then the methods in [12, 30–32], which try to find the block B thatmaximizes ρ (B , [ N ]) , have a limited ability to detect the hidden-densest block.Second, consider a block B formed by fraudulent entities, inwhich B is only the densest on { A , A , A } , and thus ρ (B , { , , }) is maximal. However, the techniques of [12, 30–32] cannot find { A , A , A } when the feature combinations are exploded.Furthermore, in R ( A , ..., A N , X ) , consider two blocks B and B , where B is the densest on A i , B is the densest on A j , and ρ (B , [ N ]) = ρ (B , [ N ]) . Does this indicate that B and B areequally suspicious? No, absolutely not, because A i could be the IPaddress feature and A j could be a trivial feature such as the user’sage, location, or gender. [Value Sharing] . Based on the considerations above, we designour approach from a different angle. The key reason behind the for-mation of dense blocks is value sharing. Given t ∈ R , a dimension A n , and t [ A n ] = a , we can identify value sharing when ∃ t ∈ R , t (cid:44) t , and t [ A n ] = a .Obviously, if a block B is dense, ρ (B , [ N ]) > .

0, then valuesharing must be occurring, i.e., value sharing results in denseblocks .Therefore, detecting dense blocks is equivalent to catching valuesharing signals. We propose ISG based on information theory anddesign the D-Spot algorithm to leverage graph features, allowingus to catch fraudulent entities within dense blocks and overcomethe limitations mentioned above.

ISG BUILDING

In this section, we present the Information Sharing Graph (ISG),which is constructed on the relation R . Catching fraudulent entities is equivalent to detecting a subsetof distinct values in a certain dimension. Let U denote the targetdimension in which a subset of distinct values form the fraudulententities we wish to detect. In R = ( A , ..., A N , X ) , we choose adimension and set it as U , and denote the remaining ( N − ) dimensions as K dimensions , k ∈ [ K ] , for brevity. We build theISG of U , i.e., the weighted-undirected graph G = ( V , E ) , in which V = { u , ..., u n } is the set of distinct values of U .In Example 1, R ( user , product , timestamp , X ) , we set U = user if we wish to detect fraudulent user accounts. In Example 2, we set U = account if we would like to identify fake accounts. In Example3, we set U = connection to catch malicious connections.To specifically describe the process of value sharing, we presentthe two following definitions: Definition 4. (Pairwise Value Sharing). Given u i , u j ∈ V and a ∈ A k , we say that u i and u j share value a on A k if ∃ t , t ∈ R such that t [ U ] = u i , t [ U ] = u j and t [ A k ] = t [ A k ] = a .Pairwise value sharing occurs when a distinct value is shared bymultiple individual entities. Given a value sharing process in which a is shared by V ⊂ V , we denote this as |V |(|V |− ) pairwise valuesharing. Definition 5. (Self-Value Sharing). Given t ∈ R , where t [ U ] = u i , u i ∈ V , and t [ A k ] = a , we say that u i shares value a on A k if ∃ t ∈ R and t (cid:44) t such that t [ U ] = u i and t [ A k ] = a .Another type of value sharing occurs when the distinct value a is shared n times by an entity u i , which can be represented by n instances of self-value sharing.In ISG G = ( V , E ) , for some edge ( u i , u j ) ∈ E , S i , j represents theinformation between u i and u j derived from the other K dimen-sions, and for some node u i ∈ V , S i denotes the information of u i calculated from the other K dimensions. From the definitions andnotations defined in the previous section, Problem 1 gives a formaldefinition of how to build the ISG of a tensor. Problem 1 (Building a pairwise information graph). (1) Input: a relation R , the target dimension U , (2) Output: the informationsharing graph G = ( V , E ) . Given a dimension A k , the target dimension U , any u i ∈ V , and anentry t ∈ R for which t [ U ] = u i , then for each a ∈ A k , we assumethat the probability of t [ A k ] = a is p k ( a ) . Edge Construction.

Based on information theory [28], the self-information of the event that u i and u j share a is: I ki , j ( a ) = log ( p k ( a ) ) . (1)To compute the pairwise value sharing between u i and u j across all K dimensions, we propose the metric S -score as the edge weightof ISG: S i , j = K (cid:213) k = (cid:213) a ∈ H k ( u i , u j ) I ki , j ( a ) , (2)where H k ( u i , u j ) is the set of all values shared by u i and u j on A k .Note that S i , j = . (cid:208) Kk = H k ( u i , u j ) = ∅ .Intuitively, if u i and u j do not have any shared values, which isto be expected in normal circumstances, we have zero information.Otherwise, we obtain some information. Thus, the higher the valueof S i , j is, the more similar u i is to u j . In practice, the S -score has alarge variance. For example, fraud user pairs sharing an IP subnetand device ID will have a high S -score, whereas normal users areunlikely to share these values with anyone, and will thus have an S -score close to zero. Additionally, the information we obtain for u i and u j sharing the value a is related to the overall probability ofthat value. For example, it would be much less surprising if theyboth follow Donald Trump on Twitter than if they both follow arelatively unknown user. Node Setting.

For a node u i ∈ V , let B( a , A k , u i , U ) be the set { t ∈ R : ( t [ A k ] = a ) ∧ ( t [ U ] = u i )} . When |B( a , A k , u i , U )| ≥ B( a , A k , u i , U ) is: I ki ( a ) = log ( p k ( a ) ) |B( a , A k , u i , U )| . (3)We now define S i to compute the self-value sharing for u i acrossall K dimensions: S i = K (cid:213) k = (cid:213) B∈ H k ( u i ) I ki ( a ) , (4)where H k ( u i ) is the set {B( a , A k , u i , U ) , ∀ a ∈ R k } and R k is the setof distinct values of A k . Note that S i = . (cid:208) Kk = H k ( u i ) = ∅ .In effect, self-value sharing occurs in certain fraud cases. Forinstance, a fraudulent user may create several fake reviews for aproduct/restaurant on Amazon/Yelp [25] over a few days. In termsof network attacks [19], a malicious TCP connection tends to attacka server multiple times. Determining [ p k ( a ) ]. We can extend the S -score to accommo-date different data types and distributions.It is difficult to determine p k ( a ) , as we do not always know thedistribution of A k . In this case, for dimensions that are attributefeatures, we assume a uniform distribution and simply set p k ( a ) = /| R k | . (5)This approximation works well for many fraud-related propertiessuch as IP subnets and device IDs, which usually follow a Poissondistribution [12].However, the uniform assumption works poorly for low-entropydistributions, such as the long-tail distribution, which is common indimensions such as items purchased or users followed. Low entropyimplies that many users behave similarly anyway, independent offrauds. Intuitively for such distributions, there is no surprise infollowing a celebrity (head of the distribution), but considerableinformation if they both follow someone at the tail. For example,20% of users correspond to more than 80% of the “follows” in onlinesocial networks. The dense subgraphs between celebrities and theirfans are very unlikely to be fraudulent. If feature A k has a long-taildistribution, its entropy is very low. For example, the entropy ofthe uniform distribution over 50 values is 3.91, but the entropy of aong-tail distribution with 90% of probabilities centered around onevalue is only 0.71. Therefore, we set p k ( a ) based on the empiricaldistribution as p k ( a ) = |B( a , A k )|/| R | , (6)when the values in A k have low entropy. We also provide an inter-face so that users can define their own p k ( a ) function. Optimization of ISG Construction.

In theory, a graph with | V | nodes has O (| V | ) edges. Naively, therefore, it takes O (| V | ) timefor graph initialization and traversal.To reduce the complexity of building the ISG, we use the key-value approach. The key corresponds to a value a on A k and the value represents the block B( a , A k ) . Let V ⊆ V denote the entitiesthat occur in B( a , A k ) . As each pair ( u i , u j ) ∈ V shares a , weincrease the value of S i , j by I ki , j ( a ) . Additionally, for each u i ∈ V ,there exists some B( a , A k , u i , U ) ⊆ B( a , A k ) . Thus, we increase thevalue of S i by I ki ( a ) if |B( a , A k , u i , U )| ≥ key-value pairs across K dimen-sions by traversing R in parallel. Thus, it takes O ( K | R | + | E |) timeto build the graph G = ( V , E ) . Note that we only retain positive S i , j and S i . In practice, G is usually sparse, which is discussed inSection 6.1. Given a relation R = ( A , ..., A N , X ) in which we set U = A N , weconstruct the ISG of U , G = ( V , E ) . Assuming there is a fraudulentblock B in R , B = (A , ..., A N , X) is transformed into a subgraph G = (V , E) in G , where V is the set of distinct values of A N and an edge S i , j ∈ E denotes the information between u i and u j calculated from the other K dimensions. Then, V is the fraud groupcomprised of fraudulent entities that we wish to detect.We summarize three critical observations of G that directly leadto the algorithms presented in Section 5.2. Given G = (V , E) , wedefine the edge density of G as ρ edдe (G) = |E||V|(|V| − )

1) The value of S i , j or S i is unusually high. Value sharingmay happen frequently, but sharing across certain features, evencertain values, is more suspicious than others. Intuitively, it mightbe suspicious if two users share an IP address or follow the samerandom “nobody” on Twitter. However, it is not so suspicious if theyhave a common gender, city, or follow the same celebrity. In otherwords, certain value sharing is likely to be fraudulent because theprobability of sharing across a particular dimension, or at a certainvalue, is quite low. Thus, the information value is high, which isaccurately captured by S i , j and S i . |V| is usually large . Fraudsters perform the same actionsmany times to achieve economies of scale. Thus, we expect tofind multiple pairwise complicities among fraudulent accounts. Anumber of studies have found that large cluster sizes are a crucialindicator of fraud [5, 38]. Intuitively, while it is natural for a fewfamily members to share an IP address, it is highly suspicious whendozens of users share one.

3) The closer ρ edдe (G) is to 1.0, the more suspicious G is. Fraudsters usually operate a number of accounts for the same job,and thus it is likely that users manipulated by the same fraudster will share the same set of values. Thus, the G formed by the fraudgroup will be well-connected. Appearance of legitimate entities on ISG . In G = ( V , E ) , givensome u i that we assume to be legitimate , let h ( u i ) denote theset of its neighbor nodes. We have two findings. (1) For u i , S i + (cid:205) u j ∈ V S i , j → u i is unlikely to share values with others.Even if exists, the shared values should have a high probability(see observation 1) and therefore small S i , j . (2) The subgraph G induced by h ( u i ) is typically not well-connected, as resource sharingis uncommon in the real world. If G is well-connected, | h ( u i )| isquite small compared with the fraud group size (see observation 2).In summary, the techniques described in [12, 20, 30–32] workdirectly on the tensor, indicating that they consider value sharing oneach dimension, and even certain values, as equivalent. In contrast,ISG assigns each instance of value sharing a theoretical weightbased on the edges and nodes of the ISG, which is more effectivefor identifying the (hidden-) densest blocks (comparison in Sec.6.2). Based on the observations in Section 4.3, we now describe ourmethod for finding objective subgraphs in G . This section is dividedinto two parts: first, we define a density metric F G , and then weillustrate the proposed D-Spot algorithm. To find the objective G = (V , E) , we define a density metric F G as[6, 11]: F G = (cid:205) ( u i , u j )∈E S i , j + (cid:205) u i ∈V S i |V| . (7)The form of F G satisfies the three key observations of G in Section4.3.(1) Keeping | V | fixed, we have that (cid:205) ( u i , u j )∈E S i , j + (cid:205) u i ∈V S i ↑⇒F G ↑ .(2) Keeping S i , j , S i , and ρ edдe (G) fixed, we have that | V | ↑⇒F G ↑ .(3) Keeping S i , j , S i , and |V| fixed, we have that ρ edдe (G) ↑⇒F G ↑ .Thus, our subgraph-detection problem can be defined as follows: Problem 2 (Detecting dense subgraphs). (1) Input: the informationsharing graph G = ( V , E ) . (2) Find: multiple subgraphs of G thatmaximize F . In real-world datasets, there are usually numerous fraud groupsforming multiple dense subgraphs. Based on the considerationsdescribed above, we propose D-Spot (Algorithms 1–3). Comparedwith other well-known algorithms for finding the densest sub-graph [6, 11], D-Spot has two differences:(1) D-Spot can detect multiple densest subgraphs simultane-ously. D-Spot first partitions the graph, and then detects asingle densest subgraph in each partition.(2) D-Spot is faster. First, instead of removing nodes one by one,D-Spot removes a set of nodes at once, reducing the numberof iterations. Second, it detects the single densest subgraph inartition G = (V , E) , rather than in graph G = ( V , E ) , where |V| << | V | and |E| << | E | .D-Spot consists of two main steps: (1) Given G , divide G intomultiple partitions (Algorithm 1); (2) In each G , find a single densesubgraph (Algorithms 2 and 3). Algorithm 1: graph partitioning.

Let ˆ G be a dense subgraphformed by a fraud group that we are about to detect. In G , thereare usually multiple ˆ G s, where each ˆ G should be independent orconnected with others by small values of S i , j . Thus we let G s be theconnected components of G as partitions (line 6). For each G ∈ G s ,we run Algorithms 2 and 3 (lines 7–9) to find ˆ G . Finally, Algorithm1 returns multiple dense subgraphs ˆ G s (line 10). Note that there isa guarantee that ˆ G s must contain the ˆ G that is at least 1 / G in terms of F (proof in Sec.6.3). Information pruning (recommended).

As mentioned before,Fraud entities usually have surprising similarities that are quanti-fied by S i , j . We want to delete edges with regular weights and thuswe provide a threshold for removing edges: θ = (cid:205) ( u i , u j )∈ E S i , j | V |(| V | − ) (8)It is easy to see that θ (conservative) is the average information ofall possible pairs ( u i , u j ). Thus, we iterate through all edges in G ,and remove those for which S i , j < θ (lines 3–5). In all experimentsof this paper, we used θ and get the expected conclusion that theperformance of D-Spot using θ is hardly different from no pruningbut using θ is able to significantly decrease the running cost ofD-Spot. Algorithm 1 find multiple dense subgraphs in G Require: G = ( V , E ) , θ (Eq.8 ), w () (Eq. 9) Ensure: ˆ G s ˆ G s ← ∅ if needed then for each S i , j ∈ E do if S i , j < θ then remove S i , j G s ← connected components of G for each G ∈ G s do ˆ G ← find a dense subgraph ( G , w () ) ˆ G s ← ˆ G s ∪{ ˆ G} return ˆ G s Algorithms 2 and 3: find a dense subgraph.

Initially, let V c bea copy of V . In each iteration (lines 5–14), we delete a set of nodes ( R , line 6) from V c until V c is empty. Of all the V c constructedduring the execution of the algorithm, that maximizing F ( ˆ R , line15) is returned as the output of the algorithm.Given a subgraph G = (V , E) , for some u i ∈ V , we define w ( u i , G) as w ( u i , G) = (cid:213) ( u j ∈V)∧(( u i , u j )∈E) S i , j + S i . (9)Lines 1–4 initialize the parameters used in the algorithm. Dict w value of each node. Dict

Algorithm 2 find a dense subgraph

Require: G = (V , E) , w () (Eq. 9) Ensure: ˆ G V c ← copy (V) S sum ← (cid:205) ( u i , u j )∈E S i , j + (cid:205) u i ∈V S i ∀ u ∈ V , Dict [ u ] ← , Dict [ u ] = w ( u , G) index ← F max ← S sum |V c | , top ← while V c (cid:44) ∅ do R ← { u ∈ V c : Dict [ u ] ≤ (cid:205) ( ui , uj )∈E S i , j + (cid:205) ui ∈V c S i |V c | } (Eq.10) sort R in increasing order of Dict [ u ] for each u ∈ R do V c ← V c − u , S sum ← S sum − Dict [ u ] index ← index + Dict [ u ] ← index F = S sum |V c | if F > F max then F max ← F , top ← index Dict ← update edges ( u , V c , Dict , G) ˆ R ← { u ∈ V : Dict [ u ] > top } return ˆ G (the subgraph induced by ˆ R ) Algorithm 3 update edges

Require: u i , V c , Dict G = (V , E) Ensure:

Dict for each u j ∈ V c do if ( u i , u j ) ∈ E then Dict [ u j ] ← Dict [ u j ] − S i , j remove ( u i , u j ) from E return Dict R that maximizes F . Line 6 determines which R are deletedin each iteration. R is confirmed by { u ∈ V : w ( u , G) ≤ w } (line 6),where the average w is given by: w = (cid:205) u ∈V w ( u , G)|V| = (cid:205) ( u i , u j )∈E S i , j + (cid:205) u i ∈V S i |V| ≤ F G , (10)because each edge S i , j is counted twice in (cid:205) u ∈V w ( u , G) . In lines7–14, the nodes in R are removed from V c in each iteration (Incontrast, [11] recomputes all nodes and finds those with the minimal w after deleting a node). As removing a subset of R may result in ahigher value of F , D-Spot records each change of F , as if the nodeswere removed one by one (lines 8–14). Algorithm 3 describes howthe edges are updated after a node is removed, requiring a total of |E| updates. Finally, Algorithm 2 returns the subgraph ˆ G inducedby ˆ R , the set of nodes achieving F max , according to top and Dict

Summary. As R contains at least one node, the worst-case timecomplexity of Algorithm 2 is O (|V| + |E|) . In practice, the worstcase is too pessimistic. In line 6, R usually contains plenty of nodes,significantly reducing the number of scans of V c (see Section 7.2). ANALYSIS6.1 Complexity.

In the graph initialization stage, it takes O ( K | R | + | E |) time to build G based on the optimization in Section 4.2. In D-Spot, the cost ofpartitioning G is O (| E |) , and detecting a dense block in a partition G requires O (|E| + |V| ) operations, where |E| << | E | , |V| << | V | .Thus, the complexity of ISG+D-Spot is linear with respect to | E | .In the worst case, admittedly, | E | = | V | when there is somedimension A k in which | R k | =

1. However, that is too pessimistic.In the target fraud attacks, fraud groups typically exhibit strongvalue sharing while legit entities should not. Hence, we expect G to be sparse because the u i only have positive edges with a smallsubset of V . We constructed a version of G using several real-worlddatasets (see Fig. 3), and the edge densities were all less than 0.06. Theorem 6. (Spotting the Hidden-Densest Block). Given a denseblock

B(A , ..., A N , X) in which the target dimension U = A N and V denotes the set of distinct values of A N , a shared value a existsin B such that, ∀ u ∈ V , ∃ t ∈ B satisfying ( t [ U ] = u ) ∧ ( t [ A k ] = a ) .Then, B must form a dense subgraph G in G . Proof. Using the optimization algorithm in Section 4.2, we build G by scanning all values in R once. Hence, the block B( a , A k ) mustbe found. Let G = (V , E) be the subgraph induced by V in G .Then, ∀ ( u i , u j ) ∈ E , the edge S i , j ≥ I ki , j ( a ) . Hence, ρ edдe (G) = . F G = (cid:205) ∀ ( ui , uj )∈E S i , j + (cid:205) ui ∈V S i |V | ≥ |V |(|V |− )I ki , j ( a )|V | = (|V| − )I ki , j ( a ) . □ Observation. (Effectiveness of ISG+D-Spot) Consider a hidden-densest block

B(A , ..., A N − , A N , X) of size |X| × ... × |X| × |B| = |X| , i.e., B is the densest on A N by sharing the value a .Then, assuming the target dimension U = A and the fraudulententities V are distinct values of A , ISG+D-Spot captures V moreaccurately than other algorithms based on tensors (denoted as Ten-sor+Other Algorithms).Proof. Let us consider a non-dense block ˆ B( ˆ A , ..., ˆ A N , X) ofsize |X| × ... × |X| , | ˆ B| = |X| , and let ˆ V denote the distinct valuesof ˆ A . Denoting legitimate entities as ˆ V and fraudulent entitiesas V , we now discuss the difference between ISG+D-Spot andTensor+Other Algorithms.[ Working on the tensor ]. On R , ˆ B is not dense and thus ρ ( ˆ B , [ N ]) = B , because {|B | , ..., |B N − | , |B N |} = {|X| , ..., |X| , } , we have ρ (B , [ N ]) = |B | N (cid:205) n ∈[ N ] |B n | ≈ N .[ Working on ISG ]. On G , let ˆ G denote the subgraph induced byˆ V and G denote the subgraph formed by V . We know that ˆ G , F ˆ G =

0, because ˆ B does not have any shared values. For G , F G = (V − )I ki , j ( a ) according to Theorem 6.[ Other Algorithms ]. M-Zoom[30] and D-Cube [32] are known tofind blocks that are at least 1 / N of the optimum in term of ρ on R ( N -Approximation guarantee).[ D-Spot ]. In Section 6.3, we will show that the subgraph detectedby D-Spot is at least 1 / F G on ISG( -Approximation guarantee). In summary, Tensor+Other Algorithms vs. ISG+D-Spot correspondsto: (cid:18) ρ ( ˆ B , [ N ]) = | ρ (B , [ N ]) ≈ + ( N -Approximation ) (cid:19) vs . (cid:18) F ˆ G = | F G = (V − )I ki , j ( a ) + (

12 -Approximation ) (cid:19) Therefore, ISG+D-Spot catches fraudulent entities within hidden-densest blocks more accurately than Tensor+Other Algorithms. □ From the observation, ISG+D-Spot can effectively detect hidden-densest blocks. Similarly, when B becomes denser, the G formedby B will also be much denser, and thus ISG+D-Spot will be moreaccurate in detecting the densest block. For brevity, we use [V] to denote a subgraph induced by the set ofnodes V .Theorem 7. (Algorithm 1 Guarantee). Given G = ( V , E ) , let G s = {G , ..., G n } denote the connected components of G . Let F opti denote the optimal F on G i , i.e., ∄ G ′ ⊆ G i satisfying F G ′ > F opti .Then, if F optn is the maximal value of { F opt , ..., F optn }, F optn mustbe the optimum in terms of F on G . Proof. Given any two sets of nodes V and V and assumingthere are no edges connecting V and V , we assume that F [V ] > F [V ] ⇒ c |V | > c |V | ⇒ c |V | > c |V | . Then, F [V ] − F [V ∪V ] ⇒ c |V | − c + c |V | + |V |⇒ c |V | − c |V ||V |(|V | + |V |) > c |V | − c |V ||V |(|V | + |V |) = . Thus, for any V and V that are not connected by any edges, itfollows that F [V ∪V ] ≤ max (F [V ] , F [V ] ) (Conclusion 1).In G n = (V n , E n ) , we use ˆ V to denote the set of nodes satisfyingˆ V ⊆ V n and F [ ˆ V] = F optn . Let V ′ be a set of nodes satisfying V ′ ⊂ V and V ′ ∩ ˆ V = ∅ . Now, let us consider two conditions.First, if V ′ ⊂ V n , then F [V ′ ] ≤ F [ ˆ V] and F [V ′ ∪ ˆ V] ≤ F [ ˆ V] because F [ ˆ V] is the optimum on G n .Second, if V ′ ∩V n = ∅ , then F [V ′ ] ≤ F [ ˆ V] and F [V ′ ∪ ˆ V] ≤ F [ ˆ V] by Conclusion 1 and because F [ ˆ V] is the maximum of { F opt , ..., F optn }.If V ′ ∩ V n (cid:44) ∅ , then V ′ can be divided into two parts conform-ing with the two conditions stated above.Therefore, ∄ V ′ ⊂ V satisfies F [V ′ ] > F [ ˆ V] or F [V ′ ∪ ˆ V] > F [ ˆ V] .We can conclude that F optn = F [ ˆ V] must be the optimum in termsof F on G . □ Theorem 8. (Algorithm 2 Guarantee). Given a graph G = (V , E) ,let Q ∗ be a subset of nodes maximizing F [ Q ∗ ] in G . Let [ Q ] be thesubgraph returned by Algorithm 2 with F [ Q ] . Then, F [ Q ] ≥ F [ Q ∗ ] . Proof. Consider the optimal set Q ∗ . We know that, ∀ u ∈ Q ∗ , w ( u , [ Q ∗ ]) ≥ F [ Q ∗ ] , because if we remove a node u for which ( u , [ Q ∗ ]) < F [ Q ∗ ] , F ′ = | Q ∗ |F [ Q ∗ ] − w ( u , Q ∗ )| Q ∗ | − > | Q ∗ |F Q ∗ − F Q ∗ | Q ∗ | − = F [ Q ∗ ] , which contradicts the definition of Q ∗ .Denote the first node that Algorithm 2 removes from Q ∗ as u i , u i ∈ R , and denote the node set before Algorithm 2 starts removing R as Q ′ . Because Q ∗ ⊆ Q ′ , we have w ( u i , [ Q ∗ ]) ≤ w ( u i , [ Q ′ ]) .According to Algorithm 2 (line 6), w ( u i , [ Q ′ ]) ≤ F [ Q ′ ] (Eq. 10).Additionally, Algorithm 2 returns the best solution when deletingnodes one by one, and so F [ Q ] ≥ F [ Q ′ ] . We conclude that F [ Q ] ≥ F [ Q ′ ] ≥ w ( u i , [ Q ′ ]) ≥ w ( u i , [ Q ∗ ]) ≥ F Q ∗ . □ In summary, let {G , ..., G n } be the subgraphs returned by D-Spot, and {F G , ..., F G n } be the corresponding scores. Then, basedon Theorems 7 and 8, F max = max (F G , ..., F G n ) is at least 1 / F on G (1 / A series of evaluation experiments were conducted under the fol-lowing conditions:

Implementation.

We implemented ISG+D-Spot in Python, andconducted all experiments on a server with two 2.20 GHz Intel(R)CPUs and 64 GB memory.

Baselines.

We selected several state-of-the-art dense-block detec-tion methods (M-Zoom [30], M-Biz [31], and D-Cube [32]) as thebaselines (using open source code). To obtain optimal performance,we run three different density metrics from [32] for each baseline: ρ (ari) , Geometric Average Mass (geo) , and Suspiciousness (sus) . Suspiciousness Score Setting . For the baselines, we considereda detected block

B(A , ..., A N , X) and let θ = ρ (B , N ) . For anyunique value a within B , we then set the suspiciousness score of a to θ . If a occurred in multiple detected dense blocks, we chose theone with the maximal value of θ . Regarding ISG+D-Spot, given adetected subgraph ˆ G = ( ˆ V , ˆ E) , for a unique value a ∈ ˆ V , we set thesuspiciousness score of a to w ( a , ˆ G) (Eq. 9). Finally, we evaluatedthe ranking of the suspiciousness scores of unique values using thestandard area under the ROC curve (AUC) metric. Table 3 summarizes the datasets used in the experiments.

Synthetic is a series of datasets we synthesized using the samemethod as in [12]. First, we generated random seven-dimensionalrelations R ( A , ..., A , X ) , in which | R | = R is1000 × × ... × R , we assume that A corresponds to usersand the other six dimensions are features. To specifically checkthe detection performance of the hidden-densest block using eachmethod, we injected a dense block B(A , ..., A , X) into R five sep-arate times, with each injection assigned a different configurationto generate five datasets. For B , |B | =

50 and |B| = λ , which denotes the number of dimensionson which B is the densest. For example, when λ =

1, the size of B is 50 × × ... ×

25; when λ =

5, the size of B is 50 × × ... × ×

25. Obviously, ρ (B , ) > ρ ( R , ) and B is the hidden-densest block when λ is small. Finally, we labeled the users within B as“fraud”. Amazon [15]. AmaOffice, AmaBaby, and AmaTools are three col-lections of reviews about office products, baby-related products,and tool products, respectively, on Amazon. They can be modeledusing the relation R ( user , product , timestamp , X ) . For each entry t ∈ R , t = ( u , p , t , x ) indicates a review x that user u reviewedproduct p at time t . According to the specific cases of fraud discov-ered by previous studies [12, 38], fraudulent groups usually exhibitsuspicious synchronized behavior in social networks. For instance,a large group of users may surprisingly review the same group ofproducts over a short period. Thus, we use a similar method as in[12, 30, 32, 38]. We use a dense block B to represent the synchro-nized behavior, where B( user , product , timestamp , X) has a sizeof 200 × ×

1. In total, we injected four such blocks B with amass randomly selected from [ , ] . The users in the injectedblocks were labeled as “malicious.” Yelp [25]. The YelpChi, YelpNYU, and YelpZip datasets [21, 25]contain restaurant reviews submitted to Yelp. They can be rep-resented by the relation R ( user , restaurant , date , X ) , where eachentry t = ( u , r , d , x ) denotes a review x by user u of restaurant r ondate d . Note that all three datasets include labels indicating whetheror not each review is fake. The detection of malicious reviews orusers is studied in [25] using text information. In these datasets,we focus on detecting fraudulent restaurants that purchase fakereviews using the three-dimensional features. Intuitively, the morefake reviews a restaurant has, the more suspicious it is. As somelegitimate users have the potential of reviewing fraudulent restau-rants, we label a restaurant as “fraudulent” if it has received morethan 40 fake reviews. DARPA [19] was collected by the Cyber Systems and Technol-ogy Group in 1998 regarding network attacks in TCP dumps. Thedata has the form R ( sourceIP , tarдetIP , timestamp , X ) . Each entry t = ( IP , IP , t ) represents a connection made from IP to IP attime t . The dataset includes labels indicating whether or not eachconnection is malicious. In practice, the punishment for maliciousconnections is to block the corresponding IP address. Thus, wecompared the detection performance of suspicious IP addresses. Welabeled an IP address as suspicious if it was involved in a maliciousconnection. AirForce [1] was used for the KDD Cup 1999, and has also beenconsidered in [30, 32]. This dataset includes a wide variety of sim-ulated intrusions in a military network environment. However, itdoes not contain any specific IP addresses. According to the car-dinality of each dimension, we chose the top-2 features and builtthe relation R ( src bytes , dst bytes , connections , X ) , where src bytesdenotes the number of data bytes sent from source to destinationand dst bytes denotes the number of data bytes sent from destina-tion to source. The target dimension U was set to be connections .Note that this dataset includes labels indicating whether or noteach connection is malicious. First, we measured the speed and accuracy with which D-Spotdetected dense subgraphs in real-world graphs. We compared the able 3: Multi-dimensional datasets used in our experiments

Synthetic TCP Dumps Review DataDARPA AirForce YelpChi YelpNYU YelpZip AmaOffice AmaBaby AmaToolsEntries (Mass) 10K 4.6M 30K 67K 359K 1.14M 53K 160K 134KDimensions 7 3 3 3 3 3 3 3 3performance of D-Spot with that of another dense-subgraph detec-tion algorithm, Fraudar [11], the extension of [6], which maximizesthe density metric by greedily deleting nodes one by one . We usedthe three Amazon datasets and applied D-Spot and Fraudar to thesame bipartite graph built on the first two dimensions, users and products , where each edge in the graph represents an entry. Wemeasured the wall-clock time (average over three runs) requiredto detect the top-4 subgraphs. Figure 2 illustrates the runtime andperformance of the two algorithms.

Performance

AUC

AmaOfﬁce AmaBaby AmaTools

Running Time S ec ond s D-Spot Fraudar

Figure 2: Comparison of D-Spot and Fraudar using the Ama-zon datasets.

D-Spot provides the best trade-off between speed and accuracy.Specifically, D-Spot is up to 11 × faster than Fraudar. This supportsour claim in Section 5.2 that the worst-case time complexity ofD-Spot ( O (|V| + |E|) ) is too pessimistic. This section illustrates the effectiveness of ISG+D-Spot for detect-ing fraudulent entities on multi-dimensional tensors. ISG+D-Spotexhibits extraordinary performance compared with the baselinemethods (Fraudar is not in the baselines as it only works on thebipartite graph).

Synthetic.

Table 4 presents the detection performance of eachmethod for the hidden-densest block. We assume that the injectedblock B is the hidden-densest block when λ ≤

3. In detail, ISG+D-Spot achieves extraordinary performance even when λ =

1, becauseeach instance of value sharing in B is accurately captured by ISGand D-Spot, providing a higher accuracy guarantee than the base-lines (Theorems 2 and 3). When λ >

3, the performance of eachmethod improves because the density of B increases as λ increases. Amazon.

Table 5 presents the results for catching suspicious usersby detecting the top-4 dense blocks on the Amazon datasets. ISG+D-Spot detects synchronized behavior accurately. The typical attackscenario involves a mass of fraudulent users creating massive num-bers of fake reviews for a comparatively small group of productsover a short period. This behavior is represented by the injectedblocks. ISG+D-Spot exhibits robust and near-perfect performance.However, the other baselines produce worse performance on theAmaOffice and AmaBaby datasets, even with the multiple supportedmetrics.

Table 4: Performance (AUC) on the Synthetic datasets λ = λ = λ = λ = λ = M-Zoom (ari)

M-Zoom (geo)

M-Zoom (sus)

M-Biz (ari)

M-Biz (geo)

M-Biz (sus)

D-Cube (ari)

D-Cube (geo)

D-Cube (sus)

AmaOffice AmaBaby AmaTools

M-Zoom (ari)

M-Zoom (geo)

M-Zoom (sus)

M-Biz (ari)

M-Biz (geo)

M-Biz (sus)

D-Cube (ari)

D-Cube (geo)

D-Cube (sus)

Table 6 reports the (highest) accuracy with which collusiverestaurants were detected by each method. In summary, usingISG+D-Spot results in the highest accuracy across all three datasets,because D-Spot applies a higher theoretical bound to the ISG.

Table 6: Performance (AUC) on the Yelp datasets

YelpChi YelpNYU YelpZip

M-Zoom (ari)

M-Zoom (geo)

M-Zoom (sus)

M-Biz (ari)

M-Biz (geo)

M-Biz (sus)

D-Cube (ari)

D-Cube (geo)

D-Cube (sus)

Table 7 lists the accuracy of each method for detectingthe source IP and the target IP. ISG+D-Spot assigns each IP address specific suspiciousness score. We chose a detected IP with thehighest score and found that the IP participated in more than 1Mmalicious connections. The top ten suspicious IPs were all involvedin more than 10k malicious connections. Thus, using ISG+D-Spotwould enable us to crack down on these malicious IP addresses inthe real world.

Table 7: Performance (AUC) on the DARPA dataset

Dataset DARPAU = source IP U = target IP

M-Zoom (ari)

M-Zoom (geo)

M-Zoom (sus)

M-Biz (ari)

M-Biz (geo)

M-Biz (sus)

D-Cube (ari)

D-Cube (geo)

D-Cube (sus)

As this dataset does not contain IP addresses, we set thetarget dimension U = connections . We randomly sample 30k con-nections from the dataset [1] three times. Table 8 lists the accuracyof each method on samples 1–3. Malicious connections form denseblocks on the two-dimensional features. The results demonstratethat ISG+D-Spot effectively detected the densest blocks. Table 8: Performance (AUC) on the AirForce dataset

Sample 1 Sample 2 Sample 3

M-Zoom (ari)

M-Zoom (geo)

M-Zoom (sus)

M-Biz (ari)

M-Biz (geo)

M-Biz (sus)

D-Cube (ari)

D-Cube (geo)

D-Cube (sus)

As mentioned in Sec.6.1, the ISG G s built using real-world tensorsare typically sparse, as value sharing should only conspicuouslyappear in fraudulent entities. We implemented G on the three Ama-zon datasets (details in Figure 3). The edge densities of G are quitelow (lower than 0.06) across all datasets, which indicates that theworst-case time complexity discussed in Section 6.1 rarely occurs.Figure 3 reports the runtime of ISG+D-Spot on the three Amazondatasets, where the number of edges was varied by subsampling en-tries in the dataset. In practice, | E | increases near-linearly with themass of the dataset. Additionally, because the time complexity ofISG+D-Spot is linear with respect to | E | , ISG+D-Spot exhibits near-linear scaling with the mass of the dataset. Figure 3 demonstratesour conclusion. Edge DensityAmaOfﬁce

AmaBaby

AmaTools S ec ond s Mass

10k 40k 70k 100k

AmaTools AmaBabyAmaOfﬁce

Figure 3: ISG+D-Spot runs in near-linear time with respectto the mass of the dataset.

This section is to demonstrate that ISG+D-Spot is more robust toresist noisy features than existing approaches. ISG automaticallyweighs each feature and continuously accumulates value sharingby one scan of the tensor, and D-Spot amounts to finds entitieswith the maximum of value sharing. We conducted the followingexperiment to demonstrate our conclusion.

Registration is a dataset derived from an e-commerce company,in which each record contains two crucial features, IP subnet andphone prefix, and three noisy features, IP city, phone city, andtimestamp. The dataset also includes labels showing whether ornot the account is a “zombie” account. Thus, it can be formulated as R ( accounts, IP, phone, IP city, phone city, timestamp , X ) . To comparethe detection performance of malicious accounts, we applied eachmethod on various R by successively appending 1–5 features to R ( accounts , X ) . Table 9: Performance (AUC) on the Registration dataset. ‘C’represents ‘crucial feature’ and ‘N’ represents ‘noisy feature’

1C 2C 2C+1N 2C+2N 2C+3N

M-Zoom (ari)

M-Zoom (geo)

M-Zoom (sus)

M-Biz (ari)

M-Biz (geo)

M-Biz (sus)

D-Cube (ari)

D-Cube (geo)

D-Cube (sus)

Table 9 gives the variation of each method with regard to theadded noisy features (3– 5 dimensions). As each account only pos-sesses one entry, R is quite sparse. We found that existing methodsusually miss small-scale instances of value sharing because theirdensity is close to the legitimate range on R . For example, a 51-member group sharing a single IP subnet was missed by the baselinemethods. However, ISG amplifies each instance of value sharingthrough its information-theoretic and graph features, allowing D-Spot to accurately capture fraudulent entities. In this paper, we novelly identified dense-block detection withdense-subgraph mining, by modeling a tensor in ISG. Additionally,we propose a multiple dense-subgraphs detection algorithm thatis faster and can be computed in parallel. In future, ISG + D-Spotwill be implemented on Apache Spark [39] to support very largetensors.

EFERENCES [1] 1999. Kdd cup 1999 data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html..[2] Melih Abdulhayoglu, Melih Abdulhayoglu, Melih Abdulhayoglu, and Melih Ab-dulhayoglu. 2017. HinDroid: An Intelligent Android Malware Detection SystemBased on Structured Heterogeneous Information Network. In

ACM SIGKDD .1507–1515.[3] Leman Akoglu, Rishi Chandy, and Christos Faloutsos. 2013. Opinion FraudDetection in Online Reviews by Network Effects.. In

ICWSM . The AAAI Press.[4] David S Anderson, Chris Fleizach, Stefan Savage, and Geoffrey M Voelker. 2007.Spamscatter: characterizing internet scam hosting infrastructure. In

Usenix Secu-rity Symposium on Usenix Security Symposium . 1132–1141.[5] Qiang Cao, Xiaowei Yang, Jieqi Yu, and Christopher Palow. 2014. UncoveringLarge Groups of Active Malicious Accounts in Online Social Networks. In

ACMCCS . 477–488.[6] Moses Charikar. 2000. Greedy approximation algorithms for finding densecomponents in a graph. In

International Workshop on Approximation Algorithmsfor Combinatorial Optimization . Springer, 84–95.[7] Jie Chen and Yousef Saad. 2012. Dense Subgraph Extraction with Application toCommunity Detection.

IEEE TKDE

24, 7 (2012), 1216–1230.[8] Hector Garcia-Molina and Jan Pedersen. 2004. Combating web spam withtrustrank. In

VLDB . 576–587.[9] Saptarshi Ghosh, Bimal Viswanath, Farshad Kooti, Naveen Kumar Sharma, Gau-tam Korlam, Fabricio Benevenuto, Niloy Ganguly, and Krishna Phani Gummadi.2012. Understanding and combating link farming in the Twitter social network.In

WWW .[10] A. V Goldberg. 1984.

Finding a Maximum Density Subgraph . Technical Report.[11] Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and ChristosFaloutsos. 2016. FRAUDAR: Bounding Graph Fraud in the Face of Camouflage.In

ACM SIGKDD . 895–904.[12] Meng Jiang, Alex Beutel, Peng Cui, Bryan Hooi, Shiqiang Yang, and ChristosFaloutsos. 2016. Spotting Suspicious Behaviors in Multimodal Data: A GeneralMetric and Algorithms.

IEEE TKDE

28, 8 (2016), 2187–2200.[13] Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2014.CatchSync: catching synchronized behavior in large directed graphs. In

ACMSIGKDD . 941–950.[14] Meng Jiang, Peng Cui, Alex Beutel, Christos Faloutsos, and Shiqiang Yang. 2016.Inferring lockstep behavior from connectivity pattern in large graphs.

Knowledge& Information Systems

48, 2 (2016), 399–428.[15] McAuley Julian. [n. d.]. Amazon product data. http://jmcauley.ucsd.edu/data/amazon/.[16] Samir Khuller and Barna Saha. 2009. On Finding Dense Subgraphs. In

Au-tomata, Languages and Programming, International Colloquium, ICALP 2009,Rhodes, Greece, July 5-12, 2009, Proceedings . 597–608.[17] Tamara G. Kolda and Brett W. Bader. 2009. Tensor Decompositions and Applica-tions.

Siam Review

51, 3 (2009), 455–500.[18] Victor E. Lee, Ruan Ning, Ruoming Jin, and Charu Aggarwal. 2010.

A Survey ofAlgorithms for Dense Subgraph Discovery . 303–336 pages.[19] R. P. Lippmann, D. J. Fried, I. Graf, J. W. Haines, K. R. Kendall, D. McClung, D.Weber, S. E. Webster, D. Wyschogrod, R. K. Cunningham, and M. A. Zissman.2000. Evaluating intrusion detection systems: the 1998 DARPA off-line intrusiondetection evaluation. In

Proceedings DARPA Information Survivability Conferenceand Exposition. DISCEX’00 , Vol. 2. 12–26 vol.2. https://doi.org/10.1109/DISCEX. 2000.821506[20] Koji Maruhashi, Fan Guo, and Christos Faloutsos. 2011. MultiAspectForensics:Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis.In

ASONAM . 203–210.[21] Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Natalie Glance. 2013. Whatyelp fake review filter might be doing?. In

Proceedings of the 7th InternationalConference on Weblogs and Social Media, ICWSM 2013 . AAAI press, 409–418.[22] Shashank Pandit, Duen Horng Chau, Samuel Wang, and Christos Faloutsos.2007. Netprobe: a fast and scalable system for fraud detection in online auctionnetworks. In

WWW . 201–210.[23] Evangelos E. Papalexakis, Christos Faloutsos, and Nicholas D. Sidiropoulos. 2012.ParCube: sparse parallelizable tensor decompositions. In

PKDD . 521–536.[24] B. Aditya Prakash, Ashwin Sridharan, Mukund Seshadri, Sridhar Machiraju,and Christos Faloutsos. 2010. EigenSpokes: Surprising Patterns and ScalableCommunity Chipping in Large Graphs. In

PAKDD .[25] Shebuti Rayana and Leman Akoglu. 2015. Collective Opinion Spam Detection:Bridging Review Networks and metadata. In

ACM SIGKDD .[26] Barna Saha, Allison Hoch, Samir Khuller, Louiqa Raschid, and Xiao Ning Zhang.2010.

Dense Subgraphs with Restrictions and Applications to Gene AnnotationGraphs . Springer Berlin Heidelberg. 456–472 pages.[27] Neil Shah, Alex Beutel, Brian Gallagher, and Christos Faloutsos. 2014. SpottingSuspicious Link Behavior with fBox: An Adversarial Perspective. In

IEEE ICDM .959–964.[28] C. E Shannon. 1948. A mathematical theory of communication.

Bell Labs TechnicalJournal

27, 4 (1948), 379–423.[29] K. Shin, T. Eliassi-Rad, and C. Faloutsos. 2017. CoreScope: Graph Mining Using k-Core Analysis – Patterns, Anomalies and Algorithms. In

ICDM , Vol. 00. 469–478.[30] Kijung Shin, Bryan Hooi, and Christos Faloutsos. 2016. M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees. In

ECML PKDD . 264–280.[31] Kijung Shin, Bryan Hooi, and Christo Faloutsos. 2018. Fast, Accurate, and Flexi-ble Algorithms for Dense Subtensor Mining.

ACM Transactions on KnowledgeDiscovery from Data

12, 3 (2018), 1–30.[32] Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos. 2017. D-Cube: Dense-Block Detection in Terabyte-Scale Tensors. In

WSDM . 681–689.[33] Kijung Shin and U. Kang. 2015. Distributed Methods for High-Dimensional andLarge-Scale Tensor Factorization. In

IEEE ICDM . 989–994.[34] Ming Yang Su. 2011. Real-time anomaly detection systems for Denial-of-Serviceattacks by weighted k-nearest-neighbor classifiers.

Expert Systems with Applica-tions

38, 4 (2011), 3492–3498.[35] Hua Tang and Zhuolin Cao. 2009. Machine Learning-based Intrusion DetectionAlgorithms.

Journal of Computational Information Systems (2009), 1825–1831.[36] Kurt Thomas, Dmytro Iatskiv, Elie Bursztein, Tadek Pietraszek, Chris Grier,and Damon McCoy. 2014. Dialing back abuse on phone verified accounts. In

Proceedings of the 2014 ACM SIGSAC . ACM, 465–476.[37] Yining Wang, Hsiao Yu Tung, Alexander Smola, and Animashree Anandkumar.2015. Fast and Guaranteed Tensor Decomposition via Sketching.

NIPS .[38] Wanhong Xu, Wanhong Xu, Christopher Palow, Christopher Palow, and ChristosFaloutsos. 2013. CopyCatch: stopping group attacks by spotting lockstep behaviorin social networks. In

WWW . 119–130.[39] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma,Murphy Mccauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Re-silient distributed datasets: a fault-tolerant abstraction for in-memory clustercomputing. In