Danai Koutra | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Danai Koutra is active.

Explore More

Publication

Featured researches published by Danai Koutra.

Data Mining and Knowledge Discovery | 2015

Graph based anomaly detection and description: a survey

Leman Akoglu; Hanghang Tong; Danai Koutra

Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised versus (semi-)supervised approaches, for static versus dynamic graphs, for attributed versus plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field.

european conference on machine learning | 2011

Unifying Guilt-by-association approaches: theorems and fast algorithms

Danai Koutra; Tai-You Ke; U Kang; Duen Horng Polo Chau; Hsing-Kuo Kenneth Pao; Christos Faloutsos

If several friends of Smith have committed petty thefts, what would you say about Smith? Most people would not be surprised if Smith is a hardened criminal. Guilt-by-association methods combine weak signals to derive stronger ones, and have been extensively used for anomaly detection and classification in numerous settings (e.g., accounting fraud, cyber-security, calling-card fraud). The focus of this paper is to compare and contrast several very successful, guilt-by-association methods: Random Walk with Restarts, Semi-Supervised Learning, and Belief Propagation (BP). Our main contributions are two-fold: (a) theoretically, we prove that all the methods result in a similar matrix inversion problem; (b) for practical applications, we developed FaBP, a fast algorithm that yields 2× speedup, equal or higher accuracy than BP, and is guaranteed to converge. We demonstrate these benefits using synthetic and real datasets, including YahooWeb, one of the largest graphs ever studied with BP.

user interface software and technology | 2014

Glance: rapidly coding behavioral video with the crowd

Walter S. Lasecki; Mitchell Gordon; Danai Koutra; Malte F. Jung; Steven P. Dow; Jeffrey P. Bigham

Behavioral researchers spend considerable amount of time coding video data to systematically extract meaning from subtle human actions and emotions. In this paper, we present Glance, a tool that allows researchers to rapidly query, sample, and analyze large video datasets for behavioral events that are hard to detect automatically. Glance takes advantage of the parallelism available in paid online crowds to interpret natural language queries and then aggregates responses in a summary view of the video data. Glance provides analysts with rapid responses when initially exploring a dataset, and reliable codings when refining an analysis. Our experiments show that Glance can code nearly 50 minutes of video in 5 minutes by recruiting over 60 workers simultaneously, and can get initial feedback to analysts in under 10 seconds for most clips. We present and compare new methods for accurately aggregating the input of multiple workers marking the spans of events in video data, and for measuring the quality of their coding in real-time before a baseline is established by measuring the variance between workers. Glances rapid responses to natural language queries, feedback regarding question ambiguity and anomalies in the data, and ability to build on prior context in followup queries allow users to have a conversation-like interaction with their data - opening up new possibilities for naturally exploring video data.

international conference on data mining | 2013

BIG-ALIGN: Fast Bipartite Graph Alignment

Danai Koutra; Hanghang Tong; David Lubensky

How can we find the virtual twin (i.e., the same or similar user) on Linked In for a user on Facebook? How can we effectively link an information network with a social network to support cross-network search? Graph alignment - the task of finding the node correspondences between two given graphs - is a fundamental building block in numerous application domains, such as social networks analysis, bioinformatics, chemistry, pattern recognition. In this work, we focus on aligning bipartite graphs, a problem which has been largely ignored by the extensive existing work on graph matching, despite the ubiquity of those graphs (e.g., users-groups network). We introduce a new optimization formulation and propose an effective and fast algorithm to solve it. We also propose a fast generalization of our approach to align unipartite graphs. The extensive experimental evaluations show that our method outperforms the state-of-art graph matching algorithms in both alignment accuracy and running time, being up to 10x more accurate or 174x faster on real graphs.

knowledge discovery and data mining | 2015

TimeCrunch: Interpretable Dynamic Graph Summarization

Neil Shah; Danai Koutra; Tianmin Zou; Brian Gallagher; Christos Faloutsos

How can we describe a large, dynamic graph over time? Is it random? If not, what are the most apparent deviations from randomness -- a dense block of actors that persists over time, or perhaps a star with many satellite nodes that appears with some fixed periodicity? In practice, these deviations indicate patterns -- for example, botnet attackers forming a bipartite core with their victims over the duration of an attack, family members bonding in a clique-like fashion over a difficult period of time, or research collaborations forming and fading away over the years. Which patterns exist in real-world dynamic graphs, and how can we find and rank them in terms of importance? These are exactly the problems we focus on in this work. Our main contributions are (a) formulation: we show how to formalize this problem as minimizing the encoding cost in a data compression paradigm, (b) algorithm: we propose TIMECRUNCH, an effective, scalable and parameter-free method for finding coherent, temporal patterns in dynamic graphs and (c) practicality: we apply our method to several large, diverse real-world datasets with up to 36 million edges and 6.3 million nodes. We show that TIMECRUNCH is able to compress these graphs by summarizing important temporal structures and finds patterns that agree with intuition.

pacific-asia conference on knowledge discovery and data mining | 2014

Com2: Fast automatic discovery of temporal ('comet') communities

Miguel Araújo; Spiros Papadimitriou; Stephan Günnemann; Christos Faloutsos; Prithwish Basu; Ananthram Swami; Evangelos E. Papalexakis; Danai Koutra

Given a large network, changing over time, how can we find patterns and anomalies? We propose Com2, a novel and fast, incremental tensor analysis approach, which can discover both transient and periodic/ repeating communities. The method is (a) scalable, being linear on the input size (b) general, (c) needs no user-defined parameters and (d) effective, returning results that agree with intuition.

advances in social networks analysis and mining | 2013

Network similarity via multiple social theories

Michele Berlingerio; Danai Koutra; Tina Eliassi-Rad; Christos Faloutsos

Given a set of k networks, possibly with different sizes and no overlaps in nodes or links, how can we quickly assess similarity between them? Analogously, are there a set of social theories which, when represented by a small number of descriptive, numerical features, effectively serve as a “signature” for the network? Having such signatures will enable a wealth of graph mining and social network analysis tasks, including clustering, outlier detection, visualization, etc. We propose a novel, effective, and scalable method, called NetSimile, for solving the above problem. Our approach has the following desirable properties: (a) It is supported by a set of social theories. (b) It gives similarity scores that are size-invariant. (c) It is scalable, being linear on the number of links for graph signature extraction. In extensive experiments on numerous synthetic and real networks from disparate domains, NetSimile outperforms baseline competitors. We also demonstrate how our approach enables several mining tasks such as clustering, visualization, discontinuity detection, network transfer learning, and re-identification across networks.

ACM Transactions on Knowledge Discovery From Data | 2016

D elta C on : Principled Massive-Graph Similarity Function with Attribution

Danai Koutra; Neil Shah; Joshua T. Vogelstein; Brian Gallagher; Christos Faloutsos

How much has a network changed since yesterday? How different is the wiring of Bob’s brain (a left-handed male) and Alice’s brain (a right-handed female), and how is it different? Graph similarity with given node correspondence, i.e., the detection of changes in the connectivity of graphs, arises in numerous settings. In this work, we formally state the axioms and desired properties of the graph similarity functions, and evaluate when state-of-the-art methods fail to detect crucial connectivity changes in graphs. We propose DeltaCon, a principled, intuitive, and scalable algorithm that assesses the similarity between two graphs on the same nodes (e.g., employees of a company, customers of a mobile carrier). In conjunction, we propose DeltaCon-Attr, a related approach that enables attribution of change or dissimilarity to responsible nodes and edges. Experiments on various synthetic and real graphs showcase the advantages of our method over existing similarity measures. Finally, we employ DeltaCon and DeltaCon-Attr on real applications: (a) we classify people to groups of high and low creativity based on their brain connectivity graphs, (b) do temporal anomaly detection in the who-emails-whom Enron graph and find the top culprits for the changes in the temporal corporate email graph, and (c) recover pairs of test-retest large brain scans ( ∼17M edges, up to 90M edges) for 21 subjects.

advances in social networks analysis and mining | 2015

If walls could talk: Patterns and anomalies in Facebook wallposts

Pravallika Devineni; Danai Koutra; Michalis Faloutsos; Christos Faloutsos

How do people interact with their Facebook wall? At a high level, this question captures the essence of our work. While most prior efforts focus on Twitter, the much fewer Facebook studies focus on the friendship graph or are limited by the amount of users or the duration of the study. In this work, we model Facebook user behavior: we analyze the wall activities of users focusing on identifying common patterns and surprising phenomena. We conduct an extensive study of roughly 7K users over three years during four month intervals each year. We propose PowerWall, a lesser known heavy-tailed distribution to fit our data. Our key results can be summarized in the following points. First, we find that many wall activities, including number of posts, number of likes, number of posts of type photo, etc., can be described by the PowerWall distribution. What is more surprising is that most of these distributions have similar slope, with a value close to 1! Second, we show how our patterns and metrics can help us spot surprising behaviors and anomalies. For example, we find a user posting every two days, exactly the same count of posts; another user posting at midnight, with no other activity before or after. Our work provides a solid step towards a systematic and quantitative wall-centric profiling of Facebook user activity.

pacific-asia conference on knowledge discovery and data mining | 2014

Net-Ray: Visualizing and Mining Billion-Scale Graphs

U Kang; Jay Yoon Lee; Danai Koutra; Christos Faloutsos

How can we visualize billion-scale graphs? How to spot outliers in such graphs quickly? Visualizing graphs is the most direct way of understanding them; however, billion-scale graphs are very difficult to visualize since the amount of information overflows the resolution of a typical screen.

Explore More