Is this you? Create Your Porfile

Sayan Ranu

Indian Institute of Technology Madras

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sayan Ranu is active.

Explore More

Publication

Featured researches published by Sayan Ranu.

international conference on data engineering | 2009

GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases

Sayan Ranu; Ambuj K. Singh

Graphs are being increasingly used to model a wide range of scientific data. Such widespread usage of graphs has generated considerable interest in mining patterns from graph databases. While an array of techniques exists to mine frequent patterns, we still lack a scalable approach to mine statistically significant patterns, specifically patterns with low p-values, that occur at low frequencies. We propose a highly scalable technique, called GraphSig, to mine significant subgraphs from large graph databases. We convert each graph into a set of feature vectors where each vector represents a region within the graph. Domain knowledge is used to select a meaningful feature set. Prior probabilities of features are computed empirically to evaluate statistical significance of patterns in the feature space. Following analysis in the feature space, only a small portion of the exponential search space is accessed for further analysis. This enables the use of existing frequent subgraph mining techniques to mine significant patterns in a scalable manner even when they are infrequent. Extensive experiments are carried out on the proposed techniques, and empirical results demonstrate that GraphSig is effective and efficient for mining significant patterns. To further demonstrate the power of significant patterns, we develop a classifier using patterns mined by GraphSig. Experimental results show that the proposed classifier achieves superior performance, both in terms of quality and computation cost, over state-of-the-art classifiers.

international world wide web conferences | 2012

Recommendations to boost content spread in social networks

Vineet Chaoji; Sayan Ranu; Rajeev Rastogi; Rushi Bhatt

Content sharing in social networks is a powerful mechanism for discovering content on the Internet. The degree to which content is disseminated within the network depends on the connectivity relationships among network nodes. Existing schemes for recommending connections in social networks are based on the number of common neighbors, similarity of user profiles, etc. However, such similarity-based connections do not consider the amount of content discovered. In this paper, we propose novel algorithms for recommending connections that boost content propagation in a social network without compromising on the relevance of the recommendations. Unlike existing work on influence propagation, in our environment, we are looking for edges instead of nodes, with a bound on the number of incident edges per node. We show that the content spread function is not submodular, and develop approximation algorithms for computing a near-optimal set of edges. Through experiments on real-world social graphs such as Flickr and Twitter, we show that our approximation algorithms achieve content spreads that are as much as 90 times higher compared to existing heuristics for recommending connections.

international conference on data engineering | 2015

Indexing and matching trajectories under inconsistent sampling rates

Sayan Ranu; Deepak P; Aditya Telang; Prasad M. Deshpande; Sriram Raghavan

Quantifying the similarity between two trajectories is a fundamental operation in analysis of spatio-temporal databases. While a number of distance functions exist, the recent shift in the dynamics of the trajectory generation procedure violates one of their core assumptions; a consistent and uniform sampling rate. In this paper, we formulate a robust distance function called Edit Distance with Projections (EDwP) to match trajectories under inconsistent and variable sampling rates through dynamic interpolation. This is achieved by deploying the idea of projections that goes beyond matching only the sampled points while aligning trajectories. To enable efficient trajectory retrievals using EDwP, we design an index structure called TrajTree. TrajTree derives its pruning power by employing the unique combination of bounding boxes with Lipschitz embedding. Extensive experiments on real trajectory databases demonstrate EDwP to be up to 5 times more accurate than the state-of-the-art distance functions. Additionally, TrajTree increases the efficiency of trajectory retrievals by up to an order of magnitude over existing techniques.

international conference on management of data | 2014

Answering top-k representative queries on graph databases

Sayan Ranu; Minh X. Hoang; Ambuj K. Singh

Given a function that classifies a data object as relevant or irrelevant, we consider the task of selecting k objects that best represent all relevant objects in the underlying database. This problem occurs naturally when analysts want to familiarize themselves with the relevant objects in a database using a small set of k exemplars. In this paper, we solve the problem of top-k representative queries on graph databases. While graph databases model a wide range of scientific data, solving the problem in the context of graphs presents us with unique challenges due to the inherent complexity of matching structures. Furthermore, top-k representative queries map to the classic Set Cover problem, making it NP-hard. To overcome these challenges, we develop a greedy approximation with theoretical guarantees on the quality of the answer set, noting that a better approximation is not feasible in polynomial time. To further optimize the quadratic computational cost of the greedy algorithm, we propose an index structure called NB-Index to index the \theta-neighborhoods of the database graphs by employing a novel combination of Lipschitz embedding and agglomerative clustering. Extensive experiments on real graph datasets validate the efficiency and effectiveness of the proposed techniques that achieve up to two orders of magnitude speed-up over state-of-the-art algorithms.

Journal of Chemical Information and Modeling | 2009

Mining Statistically Significant Molecular Substructures for Efficient Molecular Classification

Sayan Ranu; Ambuj K. Singh

The increased availability of large repositories of chemical compounds has created new challenges in designing efficient molecular querying and mining systems. Molecular classification is an important problem in drug development where libraries of chemical compounds are screened and molecules with the highest probability of success against a given target are selected. We have developed a technique called GraphSig to mine significantly over-represented molecular substructures in a given class of molecules. GraphSig successfully overcomes the scalability bottleneck of mining patterns at a low frequency. Patterns mined by GraphSig display correlation with biological activities and serve as an excellent platform on which to build molecular analysis tools. The potential of GraphSig as a chemical descriptor is explored, and support vector machines are used to classify molecules described by patterns mined using GraphSig. Furthermore, the over-represented patterns are more informative than features generated exhaustively by traditional fingerprints; this has potential in providing scaffolds and lead generation. Extensive experiments are carried out to evaluate the proposed techniques, and empirical results show promising performance in terms of classification quality. An implementation of the algorithm is available free for academic use at http://www.uweb.ucsb.edu/ approximately sayan/software/GraphSig.tar.

international conference on data mining | 2014

Inferring Uncertain Trajectories from Partial Observations

Prithu Banerjee; Sayan Ranu; Sriram Raghavan

The explosion in the availability of GPS-enabled devices has resulted in an abundance of trajectory data. In reality, however, majority of these trajectories are collected at a low sampling rate and only provide partial observations on their actually traversed routes. Consequently, they are mired with uncertainty. In this paper, we develop a technique called Infer Tra to infer uncertain trajectories from network-constrained partial observations. Rather than predicting the most likely route, the inferred uncertain trajectory takes the form of an edge-weighted graph and summarizes all probable routes in a holistic manner. For trajectory inference, Infer Tra employs Gibbs sampling by learning a Network Mobility Model (NMM) from a database of historical trajectories. Extensive experiments on real trajectory databases show that the graph-based approach of Infer Tra is up to 50% more accurate, 20 times faster, and immensely more versatile than state-of-the-art techniques.

knowledge discovery and data mining | 2013

Mining discriminative subgraphs from global-state networks

Sayan Ranu; Minh X. Hoang; Ambuj K. Singh

Global-state networks provide a powerful mechanism to model the increasing heterogeneity in data generated by current systems. Such a network comprises of a series of network snapshots with dynamic local states at nodes, and a global network state indicating the occurrence of an event. Mining discriminative subgraphs from global-state networks allows us to identify the influential sub-networks that have maximum impact on the global state and unearth the complex relationships between the local entities of a network and their collective behavior. In this paper, we explore this problem and design a technique called MINDS to mine minimally discriminative subgraphs from large global-state networks. To combat the exponential subgraph search space, we derive the concept of an edit map and perform Metropolis Hastings sampling on it to compute the answer set. Furthermore, we formulate the idea of network-constrained decision trees to learn prediction models that adhere to the underlying network structure. Extensive experiments on real datasets demonstrate excellent accuracy in terms of prediction quality. Additionally, MINDS achieves a speed-up of at least four orders of magnitude over baseline techniques.

very large data bases | 2011

Answering top-k queries over a mixture of attractive and repulsive dimensions

Sayan Ranu; Ambuj K. Singh

In this paper, we formulate a top-k query that compares objects in a database to a user-provided query object on a novel scoring function. The proposed scoring function combines the idea of attractive and repulsive dimensions into a general framework to overcome the weakness of traditional distance or similarity measures. We study the properties of the proposed class of scoring functions and develop efficient and scalable index structures that index the isolines of the function. We demonstrate various scenarios where the query finds application. Empirical evaluation demonstrates a performance gain of one to two orders of magnitude on querying time over existing state-of-the-art top-k techniques. Further, a qualitative analysis is performed on a real dataset to highlight the potential of the proposed query in discovering hidden data characteristics.

international conference on management of data | 2017

Debunking the Myths of Influence Maximization: An In-Depth Benchmarking Study

Akhil Arora; Sainyam Galhotra; Sayan Ranu

Influence maximization (IM) on social networks is one of the most active areas of research in computer science. While various IM techniques proposed over the last decade have definitely enriched the field, unfortunately, experimental reports on existing techniques fall short in validity and integrity since many comparisons are not based on a common platform or merely discussed in theory. In this paper, we perform an in-depth benchmarking study of IM techniques on social networks. Specifically, we design a benchmarking platform, which enables us to evaluate and compare the existing techniques systematically and thoroughly under identical experimental conditions. Our benchmarking results analyze and diagnose the inherent deficiencies of the existing approaches and surface the open challenges in IM even after a decade of research. More fundamentally, we unearth and debunk a series of myths and establish that there is no single state-of-the-art technique in IM. At best, a technique is the state of the art in only one aspect.

Molecular Informatics | 2011

Probabilistic Substructure Mining From Small-Molecule Screens.

Sayan Ranu; Bradley T. Calhoun; Ambuj K. Singh; S. Joshua Swamidass

Identifying the overrepresented substructures from a set of molecules with similar activity is a common task in chemical informatics. Existing substructure miners are deterministic, requiring the activity of all mined molecules to be known with high confidence. In contrast, we introduce pGraphSig, a probabilistic structure miner, which effectively mines structures from noisy data, where many molecules are labeled with their probability of being active. We benchmark pGraphSig on data from several small‐molecule high throughput screens, finding that it can more effectively identify overrepresented structures than a deterministic structure miner.

Explore More