Bryan Hooi
Carnegie Mellon University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bryan Hooi.
siam international conference on data mining | 2016
Bryan Hooi; Neil Shah; Alex Beutel; Stephan Günnemann; Leman Akoglu; Mohit Kumar; Disha Makhija; Christos Faloutsos
Review fraud is a pervasive problem in online commerce, in which fraudulent sellers write or purchase fake reviews to manipulate perception of their products and services. Fake reviews are often detected based on several signs, including 1) they occur in short bursts of time; 2) fraudulent user accounts have skewed rating distributions. However, these may both be true in any given dataset. Hence, in this paper, we propose an approach for detecting fraudulent reviews which combines these 2 approaches in a principled manner, allowing successful detection even when one of these signs is not present. To combine these 2 approaches, we formulate our Bayesian Inference for Rating Data (BIRD) model, a flexible Bayesian model of user rating behavior. Based on our model we formulate a likelihood-based suspiciousness metric, Normalized Expected Surprise Total (NEST). We propose a linear-time algorithm for performing Bayesian inference using our model and computing the metric. Experiments on real data show that BIRDNEST successfully spots review fraud in large, real-world graphs: the 50 most suspicious users of the Flipkart platform flagged by our algorithm were investigated and all identified as fraudulent by domain experts at Flipkart.
european conference on machine learning | 2016
Kijung Shin; Bryan Hooi; Christos Faloutsos
Given a large-scale and high-order tensor, how can we find dense blocks in it? Can we find them in near-linear time but with a quality guarantee? Extensive previous work has shown that dense blocks in tensors as well as graphs indicate anomalous or fraudulent behavior e.g., lockstep behavior in social networks. However, available methods for detecting such dense blocks are not satisfactory in terms of speed, accuracy, or flexibility. In this work, we propose M-Zoom, a flexible framework for finding dense blocks in tensors, which works with a broad class of density measures. M-Zoom has the following properties: 1 Scalable: M-Zoom scales linearly with all aspects of tensors and is upi¾źto 114
web search and data mining | 2017
Kijung Shin; Bryan Hooi; Jisu Kim; Christos Faloutsos
IEEE Transactions on Knowledge and Data Engineering | 2016
Meng Jiang; Alex Beutel; Peng Cui; Bryan Hooi; Shiqiang Yang; Christos Faloutsos
\times
international conference on data mining | 2016
Neil Shah; Alex Beutel; Bryan Hooi; Leman Akoglu; Stephan Gunnemann; Disha Makhija; Mohit Kumar; Christos Faloutsos
PLOS ONE | 2016
Evangelos E. Papalexakis; Bryan Hooi; Konstantinos Pelechrinis; Christos Faloutsos
faster than state-of-the-art methods with similar accuracy. 2 Provably accurate: M-Zoom provides a guarantee on the lowest density of the blocks it finds. 3 Flexible: M-Zoom supports multi-block detection and size bounds as well as diverse density measures. 4 Effective: M-Zoom successfully detected edit wars and bot activities in Wikipedia, and spotted network attacks from a TCP dump with near-perfect accuracy AUCi¾ź=i¾ź0.98. The data and software related to this paper are available at http://www.cs.cmu.edu/~kijungs/codes/mzoom/.
web search and data mining | 2018
Srijan Kumar; Bryan Hooi; Disha Makhija; Mohit Kumar; Christos Faloutsos; V.S. Subrahmanian
How can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense blocks in real-world tensors (e.g., social media, Wikipedia, TCP dumps, etc.) signal anomalous or fraudulent behavior such as retweet boosting, bot activities, and network attacks. Thus, various approaches, including tensor decomposition and search, have been used for rapid and accurate dense-block detection in tensors. However, all such methods have low accuracy, or assume that tensors are small enough to fit in main memory, which is not true in many real-world applications such as social media and web. To overcome these limitations, we propose D-Cube, a disk-based dense-block detection method, which also can be run in a distributed manner across multiple machines. Compared with state-of-the-art methods, D-Cube is (1) Memory Efficient: requires up to 1,600 times less memory and handles 1,000 times larger data (2.6TB), (2) Fast: up to 5 times faster due to its near-linear scalability with all aspects of data, (3) Provably Accurate: gives a guarantee on the densities of the blocks it finds, and (4) Effective: successfully spotted network attacks from TCP dumps and synchronized behavior in rating data with the highest accuracy.
european conference on machine learning | 2017
Hyun Ah Song; Bryan Hooi; Marko Jereminov; Amritanshu Pandey; Lawrence T. Pileggi; Christos Faloutsos
Many commercial products and academic research activities are embracing behavior analysis as a technique for improving detection of attacks of many sorts-from retweet boosting, hashtag hijacking to link advertising. Traditional approaches focus on detecting dense blocks in the adjacency matrix of graph data, and recently, the tensors of multimodal data. No method gives a principled way to score the suspiciousness of dense blocks with different numbers of modes and rank them to draw human attention accordingly. In this paper, we first give a list of axioms that any metric of suspiciousness should satisfy; we propose an intuitive, principled metric that satisfies the axioms, and is fast to compute; moreover, we propose CrossSpot, an algorithm to spot dense blocks that are worth inspecting, typically indicating fraud or some other noteworthy deviation from the usual, and sort them in the order of importance (“suspiciousness”). Finally, we apply CrossSpot to the real data, where it improves the F1 score over previous techniques by 68 percent and finds suspicious behavioral patterns in social datasets spanning 0.3 billion posts.
international world wide web conferences | 2017
Tsubasa Takahashi; Bryan Hooi; Christos Faloutsos
Given a network with attributed edges, how can we identify anomalous behavior? Networks with edge attributes are ubiquitous, and capture rich information about interactions between nodes. In this paper, we aim to utilize exactly this information to discern suspicious from typical behavior in an unsupervised fashion, lending well to the traditional scarcity of ground-truth labels in practical anomaly detection scenarios. Our work has a number of notable contributions, including (a) formulation: while most other graph-based anomaly detection works use structural graph connectivity or node information, we focus on the new problem of leveraging edge information, (b) methodology: we introduce EdgeCentric, an intuitive and scalable compression-based approach for detecting edge-attributed graph anomalies, and (c) practicality: we show that EdgeCentric successfully spots numerous such anomalies in several large, edge-attributed real-world graphs, including the Flipkart e-commerce graph with over 3 million product reviews between 1.1 million users and 545 thousand products, where it achieved 0.87 precision over the top 100 results.
european conference on machine learning | 2017
Bryan Hooi; Shenghua Liu; Asim Smailagic; Christos Faloutsos
Complex networks have been shown to exhibit universal properties, with one of the most consistent patterns being the scale-free degree distribution, but are there regularities obeyed by the r-hop neighborhood in real networks? We answer this question by identifying another power-law pattern that describes the relationship between the fractions of node pairs C(r) within r hops and the hop count r. This scale-free distribution is pervasive and describes a large variety of networks, ranging from social and urban to technological and biological networks. In particular, inspired by the definition of the fractal correlation dimension D2 on a point-set, we consider the hop-count r to be the underlying distance metric between two vertices of the network, and we examine the scaling of C(r) with r. We find that this relationship follows a power-law in real networks within the range 2 ≤ r ≤ d, where d is the effective diameter of the network, that is, the 90-th percentile distance. We term this relationship as power-hop and the corresponding power-law exponent as power-hop exponent h. We provide theoretical justification for this pattern under successful existing network models, while we analyze a large set of real and synthetic network datasets and we show the pervasiveness of the power-hop.