Deepayan Chakrabarti | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Deepayan Chakrabarti is active.

Explore More

Publication

Featured researches published by Deepayan Chakrabarti.

symposium on reliable distributed systems | 2003

Epidemic spreading in real networks: an eigenvalue viewpoint

Yang Wang; Deepayan Chakrabarti; Chenxi Wang; Christos Faloutsos

How will a virus propagate in a real network? Does an epidemic threshold exist for a finite graph? How long does it take to disinfect a network given particular values of infection rate and virus death rate? We answer the first question by providing equations that accurately model virus propagation in any network including real and synthesized network graphs. We propose a general epidemic threshold condition that applies to arbitrary graphs: we prove that, under reasonable approximations, the epidemic threshold for a network is closely related to the largest eigenvalue of its adjacency matrix. Finally, for the last question, we show that infections tend to zero exponentially below the epidemic threshold. We show that our epidemic threshold model subsumes many known thresholds for special-case graphs (e.g., Erdos-Renyi, BA power-law, homogeneous); we show that the threshold tends to zero for infinite power-law graphs. We show that our threshold condition holds for arbitrary graphs.

ACM Computing Surveys | 2006

Graph mining: Laws, generators, and algorithms

Deepayan Chakrabarti; Christos Faloutsos

How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M : N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: “How can we generate synthetic but realistic graphs?” To answer this, we must first understand what patterns are common in real-world graphs and can thus be considered a mark of normality/realism. This survey give an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.

knowledge discovery and data mining | 2006

Evolutionary clustering

Deepayan Chakrabarti; Ravi Kumar; Andrew Tomkins

We consider the problem of clustering data over time. An evolutionary clustering should simultaneously optimize two potentially conflicting criteria: first, the clustering at any point in time should remain faithful to the current data as much as possible; and second, the clustering should not shift dramatically from one timestep to the next. We present a generic framework for this problem, and discuss evolutionary versions of two widely-used clustering algorithms within this framework: k-means and agglomerative hierarchical clustering. We extensively evaluate these algorithms on real data sets and show that our algorithms can simultaneously attain both high accuracy in capturing todays data, and high fidelity in reflecting yesterdays clustering.

ACM Transactions on Information and System Security | 2008

Epidemic thresholds in real networks

Deepayan Chakrabarti; Yang Wang; Chenxi Wang; Jurij Leskovec; Christos Faloutsos

How will a virus propagate in a real network? How long does it take to disinfect a network given particular values of infection rate and virus death rate? What is the single best node to immunize? Answering these questions is essential for devising network-wide strategies to counter viruses. In addition, viral propagation is very similar in principle to the spread of rumors, information, and “fads,” implying that the solutions for viral propagation would also offer insights into these other problem settings. We answer these questions by developing a nonlinear dynamical system (NLDS) that accurately models viral propagation in any arbitrary network, including real and synthesized network graphs. We propose a general epidemic threshold condition for the NLDS system: we prove that the epidemic threshold for a network is exactly the inverse of the largest eigenvalue of its adjacency matrix. Finally, we show that below the epidemic threshold, infections die out at an exponential rate. Our epidemic threshold model subsumes many known thresholds for special-case graphs (e.g., Erdös--Rényi, BA powerlaw, homogeneous). We demonstrate the predictive power of our model with extensive experiments on real and synthesized graphs, and show that our threshold condition holds for arbitrary graphs. Finally, we show how to utilize our threshold condition for practical uses: It can dictate which nodes to immunize; it can assess the effects of a throttling policy; it can help us design network topologies so that they are more resistant to viruses.

international conference on data mining | 2005

Neighborhood formation and anomaly detection in bipartite graphs

Jimeng Sun; Huiming Qu; Deepayan Chakrabarti; Christos Faloutsos

Many real applications can be modeled using bipartite graphs, such as users vs. files in a P2P system, traders vs. stocks in a financial trading system, conferences vs. authors in a scientific publication network, and so on. We introduce two operations on bipartite graphs: 1) identifying similar nodes (Neighborhood formation), and 2) finding abnormal nodes (Anomaly detection). And we propose algorithms to compute the neighborhood for each node using random walk with restarts and graph partitioning; we also propose algorithms to identify abnormal nodes, using neighborhood information. We evaluate the quality of neighborhoods based on semantics of the datasets, and we also measure the performance of the anomaly detection algorithm with manually injected anomalies. Both effectiveness and efficiency of the methods are confirmed by experiments on several real datasets.

european conference on principles of data mining and knowledge discovery | 2004

AutoPart: parameter-free graph partitioning and outlier detection

Deepayan Chakrabarti

Graphs arise in numerous applications, such as the analysis of the Web, router networks, social networks, co-citation graphs, etc. Virtually all the popular methods for analyzing such graphs, for example, k-means clustering, METIS graph partitioning and SVD/PCA, require the user to specify various parameters such as the number of clusters, number of partitions and number of principal components. We propose a novel way to group nodes, using information-theoretic principles to choose both the number of such groups and the mapping from nodes to groups. Our algorithm is completely parameter-free, and also scales practically linearly with the problem size. Further, we propose novel algorithms which use this node group structure to get further insights into the data, by finding outliers and computing distances between groups. Finally, we present experiments on multiple synthetic and real-life datasets, where our methods give excellent, intuitive results.

international world wide web conferences | 2008

Contextual advertising by combining relevance with click feedback

Deepayan Chakrabarti; Deepak Agarwal; Vanja Josifovski

Contextual advertising supports much of the Webs ecosystem today. User experience and revenue (shared by the site publisher and the ad network) depend on the relevance of the displayed ads to the page content. As with other document retrieval systems, relevance is provided by scoring the match between individual ads (documents) and the content of the page where the ads are shown (query). In this paper we show how this match can be improved significantly by augmenting the ad-page scoring function with extra parameters from a logistic regression model on the words in the pages and ads. A key property of the proposed model is that it can be mapped to standard cosine similarity matching and is suitable for efficient and scalable implementation over inverted indexes. The model parameter values are learnt from logs containing ad impressions and clicks, with shrinkage estimators being used to combat sparsity. To scale our computations to train on an extremely large training corpus consisting of several gigabytes of data, we parallelize our fitting algorithm in a Hadoop framework [10]. Experimental evaluation is provided showing improved click prediction over a holdout set of impression and click events from a large scale real-world ad placement engine. Our best model achieves a 25% lift in precision relative to a traditional information retrieval model which is based on cosine similarity, for recalling 10% of the clicks in our test data.

IEEE Transactions on Robotics and Automation | 2004

A real-time expectation-maximization algorithm for acquiring multiplanar maps of indoor environments with mobile robots

Sebastian Thrun; Christian Martin; Yufeng Liu; Dirk Hähnel; Rosemary Emery-Montemerlo; Deepayan Chakrabarti; Wolfram Burgard

This paper presents a real-time algorithm for acquiring compact three-dimensional maps of indoor environments, using a mobile robot equipped with range and imaging sensors. Building on previous work on real-time pose estimation during mapping, our approach extends the popular expectation-maximization algorithm to multisurface models, and makes it amenable to real-time execution. Maps acquired by our algorithm consist of compact sets of textured polygons that can be visualized interactively. Experimental results obtained in corridor-type environments illustrate that compact and accurate maps can be acquired in real time and in a fully automated fashion.

international conference on machine learning | 2007

Multi-armed bandit problems with dependent arms

Sandeep Pandey; Deepayan Chakrabarti; Deepak Agarwal

We provide a framework to exploit dependencies among arms in multi-armed bandit problems, when the dependencies are in the form of a generative model on clusters of arms. We find an optimal MDP-based policy for the discounted reward case, and also give an approximation of it with formal error guarantee. We discuss lower bounds on regret in the undiscounted reward scenario, and propose a general two-level bandit policy for it. We propose three different instantiations of our general policy and provide theoretical justifications of how the regret of the instantiated policies depend on the characteristics of the clusters. Finally, we empirically demonstrate the efficacy of our policies on large-scale real-world and synthetic data, and show that they significantly outperform classical policies designed for bandits with independent arms.

international world wide web conferences | 2007

Page-level template detection via isotonic smoothing

Deepayan Chakrabarti; Ravi Kumar; Kunal Punera

We develop a novel framework for the page-level template detection problem. Our framework is built on two main ideas. The first is theautomatic generation of training data for a classifier that, given apage, assigns a templateness score to every DOM node of the page. The second is the global smoothing of these per-node classifier scores bysolving a regularized isotonic regression problem; the latter follows from a simple yet powerful abstraction of templateness on a page. Our extensive experiments on human-labeled test data show that our approachdetects templates effectively.

Explore More