Leman Akoglu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Leman Akoglu is active.

Explore More

Publication

Featured researches published by Leman Akoglu.

knowledge discovery and data mining | 2010

OddBall: spotting anomalies in weighted graphs

Leman Akoglu; Mary McGlohon; Christos Faloutsos

Given a large, weighted graph, how can we find anomalies? Which rules should be violated, before we label a node as an anomaly? We propose the oddball algorithm, to find such nodes The contributions are the following: (a) we discover several new rules (power laws) in density, weights, ranks and eigenvalues that seem to govern the so-called “neighborhood sub-graphs” and we show how to use these rules for anomaly detection; (b) we carefully choose features, and design oddball, so that it is scalable and it can work un-supervised (no user-defined constants) and (c) we report experiments on many real graphs with up to 1.6 million nodes, where oddball indeed spots unusual nodes that agree with intuition.

Data Mining and Knowledge Discovery | 2015

Graph based anomaly detection and description: a survey

Leman Akoglu; Hanghang Tong; Danai Koutra

Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised versus (semi-)supervised approaches, for static versus dynamic graphs, for attributed versus plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field.

knowledge discovery and data mining | 2008

Weighted graphs and disconnected components: patterns and a generator

Mary McGlohon; Leman Akoglu; Christos Faloutsos

The vast majority of earlier work has focused on graphs which are both connected (typically by ignoring all but the giant connected component), and unweighted. Here we study numerous, real, weighted graphs, and report surprising discoveries on the way in which new nodes join and form links in a social network. The motivating questions were the following: How do connected components in a graph form and change over time? What happens after new nodes join a network -- how common are repeated edges? We study numerous diverse, real graphs (citation networks, networks in social media, internet traffic, and others); and make the following contributions: (a) we observe that the non-giant connected components seem to stabilize in size, (b) we observe the weights on the edges follow several power laws with surprising exponents, and (c) we propose an intuitive, generative model for graph growth that obeys observed patterns.

knowledge discovery and data mining | 2014

Focused clustering and outlier detection in large attributed graphs

Bryan Perozzi; Leman Akoglu; Patricia Iglesias Sánchez; Emmanuel Müller

Graph clustering and graph outlier detection have been studied extensively on plain graphs, with various applications. Recently, algorithms have been extended to graphs with attributes as often observed in the real-world. However, all of these techniques fail to incorporate the user preference into graph mining, and thus, lack the ability to steer algorithms to more interesting parts of the attributed graph. In this work, we overcome this limitation and introduce a novel user-oriented approach for mining attributed graphs. The key aspect of our approach is to infer user preference by the so-called focus attributes through a set of user-provided exemplar nodes. In this new problem setting, clusters and outliers are then simultaneously mined according to this user preference. Specifically, our FocusCO algorithm identifies the focus, extracts focused clusters and detects outliers. Moreover, FocusCO scales well with graph size, since we perform a local clustering of interest to the user rather than global partitioning of the entire graph. We show the effectiveness and scalability of our method on synthetic and real-world graphs, as compared to both existing graph clustering and outlier detection approaches.

european conference on machine learning | 2010

Surprising patterns for the call duration distribution of mobile phone users

Pedro O. S. Vaz de Melo; Leman Akoglu; Christos Faloutsos; Antonio Alfredo Ferreira Loureiro

How long are the phone calls of mobile users? What are the chances of a call to end, given its current duration? Here we answer these questions by studying the call duration distributions (CDDs) of individual users in large mobile networks. We analyzed a large, real network of 3.1 million users and more than one billion phone call records from a private mobile phone company of a large city, spanning 0.1TB. Our first contribution is the TLAC distribution to fit the CDD of each user; TLAC is the truncated version of so-called log-logistic distribution, a skewed, power-law-like distribution. We show that the TLAC is an excellent fit for the overwhelming majority of our users (more than 96% of them), much better than exponential or lognormal. Our second contribution is the MetaDist to model the collective behavior of the users given their CDDs. We show that the MetaDist distribution accurately and succinctly describes the calls duration behavior of users in large mobile networks. All of our methods are fast, and scale linearly with the number of customers.

decision support systems | 2015

APATE: A novel approach for automated credit card transaction fraud detection using network-based extensions

Véronique Van Vlasselaer; Cristián Bravo; Olivier Caelen; Tina Eliassi-Rad; Leman Akoglu; Monique Snoeck; Bart Baesens

In the last decade, the ease of online payment has opened up many new opportunities for e-commerce, lowering the geographical boundaries for retail. While e-commerce is still gaining popularity, it is also the playground of fraudsters who try to misuse the transparency of online purchases and the transfer of credit card records. This paper proposes APATE, a novel approach to detect fraudulent credit card transactions conducted in online stores. Our approach combines (1) intrinsic features derived from the characteristics of incoming transactions and the customer spending history using the fundamentals of RFM (Recency - Frequency - Monetary); and (2) network-based features by exploiting the network of credit card holders and merchants and deriving a time-dependent suspiciousness score for each network object. Our results show that both intrinsic and network-based features are two strongly intertwined sides of the same picture. The combination of these two types of features leads to the best performing models which reach AUC-scores higher than 0.98.

conference on information and knowledge management | 2012

Fast and reliable anomaly detection in categorical data

Leman Akoglu; Hanghang Tong; Jilles Vreeken; Christos Faloutsos

Spotting anomalies in large multi-dimensional databases is a crucial task with many applications in finance, health care, security, etc. We introduce COMPREX, a new approach for identifying anomalies using pattern-based compression. Informally, our method finds a collection of dictionaries that describe the norm of a database succinctly, and subsequently flags those points dissimilar to the norm---with high compression cost---as anomalies. Our approach exhibits four key features: 1) it is parameter-free; it builds dictionaries directly from data, and requires no user-specified parameters such as distance functions or density and similarity thresholds, 2) it is general; we show it works for a broad range of complex databases, including graph, image and relational databases that may contain both categorical and numerical features, 3) it is scalable; its running time grows linearly with respect to both database size as well as number of dimensions, and 4) it is effective; experiments on a broad range of datasets show large improvements in both compression, as well as precision in anomaly detection, outperforming its state-of-the-art competitors.

knowledge discovery and data mining | 2009

Large human communication networks: patterns and a utility-driven generator

Nan Du; Christos Faloutsos; Bai Wang; Leman Akoglu

Given a real, and weighted person-to-person network which changes over time, what can we say about the cliques that it contains? Do the incidents of communication, or weights on the edges of a clique follow any pattern? Real, and in-person social networks have many more triangles than chance would dictate. As it turns out, there are many more cliques than one would expect, in surprising patterns. In this paper, we study massive real-world social networks formed by direct contacts among people through various personal communication services, such as Phone-Call, SMS, IM etc. The contributions are the following: (a) we discover surprising patterns with the cliques, (b) we report power-laws of the weights on the edges of cliques, (c) our real networks follow these patterns such that we can trust them to spot outliers and finally, (d) we propose the first utility-driven graph generator for weighted time-evolving networks, which match the observed patterns. Our study focused on three large datasets, each of which is a different type of communication service, with over one million records, and spans several months of activity.

conference on online social networks | 2015

Discovering Opinion Spammer Groups by Network Footprints

Junting Ye; Leman Akoglu

Online reviews are an important source for consumers to evaluate products/services on the Internet (e.g. Amazon, Yelp, etc.). However, more and more fraudulent reviewers write fake reviews to mislead users. To maximize their impact and share effort, many spam attacks are organized as campaigns, by a group of spammers. In this paper, we propose a new two-step method to discover spammer groups and their targeted products. First, we introduce NFS (Network Footprint Score), a new measure that quantifies the likelihood of products being spam campaign targets. Second, we carefully devise GroupStrainer to cluster spammers on a 2-hop subgraph induced by top ranking products. Our approach has four key advantages: (i) unsupervised detection; both steps require no labeled data, (ii) adversarial robustness; we quantify statistical distortions in the review network, of which spammers have only a partial view, and avoid any side information that spammers can easily evade, (iii) sensemaking; the output facilitates the exploration of the nested hierarchy (i.e., organization) among the spammers, and finally (iv) scalability; both steps have complexity linear in network size, moreover, GroupStrainer operates on a carefully induced subnetwork. We demonstrate the efficiency and effectiveness of our approach on both synthetic and real-world datasets from two different domains with millions of products and reviewers. Moreover, we discover interesting strategies that spammers employ through case studies of our detected groups.

international conference on data mining | 2008

RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs

Leman Akoglu; Mary McGlohon; Christos Faloutsos

How do real, weighted graphs change over time? What patterns, if any, do they obey? Earlier studies focus on unweighted graphs, and, with few exceptions, they focus on static snapshots. Here, we report patterns we discover on several real, weighted, time-evolving graphs. The reported patterns can help in detecting anomalies in natural graphs, in making link prediction and in providing more criteria for evaluation of synthetic graph generators. We further propose an intuitive and easy way to construct weighted, time-evolving graphs. In fact, we prove that our generator will produce graphs which obey many patterns and laws observed to date. We also provide empirical evidence to support our claims.

Explore More