Prabhakar Raghavan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Prabhakar Raghavan is active.

Explore More

Publication

Featured researches published by Prabhakar Raghavan.

international conference on management of data | 1998

Automatic subspace clustering of high dimensional data for data mining applications

Rakesh Agrawal; Johannes Gehrke; Dimitrios Gunopulos; Prabhakar Raghavan

Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate cluster in large high dimensional datasets.

international world wide web conferences | 2000

Graph structure in the Web

Andrei Z. Broder; Ravi Kumar; Farzin Maghoul; Prabhakar Raghavan; Sridhar Rajagopalan; Raymie Stata; Andrew Tomkins; Janet L. Wiener

The study of the web as a graph is not only fascinating in its own right, but also yields valuable insight into web algorithms for crawling, searching and community discovery, and the sociological phenomena which characterize its evolution. We report on experiments on local and global properties of the web graph using two Altavista crawls each with over 200 million pages and 1.5 billion links. Our study indicates that the macroscopic structure of the web is considerably more intricate than suggested by earlier experiments on a smaller scale.

international world wide web conferences | 2004

Propagation of trust and distrust

Ramanathan V. Guha; Ravi Kumar; Prabhakar Raghavan; Andrew Tomkins

A (directed) network of people connected by ratings or trust scores, and a model for propagating those trust scores, is a fundamental building block in many of todays most successful e-commerce and recommendation systems. We develop a framework of trust propagation schemes, each of which may be appropriate in certain circumstances, and evaluate the schemes on a large trust network consisting of 800K trust scores expressed among 130K people. We show that a small number of expressed trusts/distrust per individual allows us to predict trust between any two people in the system with high accuracy. Our work appears to be the first to incorporate distrust in a computational trust propagation setting.

international world wide web conferences | 1999

Trawling the Web for emerging cyber-communities

Ravi Kumar; Prabhakar Raghavan; Sridhar Rajagopalan; Andrew Tomkins

The Web harbors a large number of communities — groups of content-creators sharing a common interest — each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities — those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.

symposium on principles of database systems | 1998

Latent semantic indexing: a probabilistic analysis

Christos H. Papadimitriou; Hisao Tamaki; Prabhakar Raghavan; Santosh Vempala

Latent semantic indexing LSI is an information retrieval technique based on the spectral analysis of the term document matrix whose empirical success had heretofore been without rigorous prediction and explanation We prove that under certain conditions LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance We also propose the technique of random projection as a way of speeding up LSI We complement our theorems with encouraging experimental results We also argue that our results may be viewed in a more general framework as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative ltering

computing and combinatorics conference | 1999

The web as a graph: measurements, models, and methods

Jon M. Kleinberg; Ravi Kumar; Prabhakar Raghavan; Sridhar Rajagopalan; Andrew Tomkins

The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons -- mathematical, sociological, and commercial -- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web.

acm conference on hypertext | 1998

Inferring Web communities from link topology

David Gibson; Jon M. Kleinberg; Prabhakar Raghavan

The World Wide Web grows through a decentralized, almost anarchic process, and this has resulted in a large hyperlinked corpus without the kind of logical organization that can be built into more tradit,ionally-created hypermedia. To extract, meaningful structure under such circumstances, we develop a notion of hyperlinked communities on the www t,hrough an analysis of the link topology. By invoking a simple, mathematically clean method for defining and exposing the structure of these communities, we are able to derive a number of themes: The communities can be viewed as containing a core of central, “authoritative” pages linked togh and they exhibit a natural type of hierarchical topic generalization that can be inferred directly from the pat,t,ern of linkage. Our investigation shows that although the process by which users of the Web create pages and links is very difficult to understand at a “local” level, it results in a much greater degree of orderly high-level structure than has typically been assumed.

Combinatorica | 1987

Randomized rounding: a technique for provably good algorithms and algorithmic proofs

Prabhakar Raghavan; Clark D. Tompson

We study the relation between a class of 0–1 integer linear programs and their rational relaxations. We give a randomized algorithm for transforming an optimal solution of a relaxed problem into a provably good solution for the 0–1 problem. Our technique can be a of extended to provide bounds on the disparity between the rational and 0–1 optima for a given problem instance.

international world wide web conferences | 1998

Automatic resource compilation by analyzing hyperlink structure and associated text

Soumen Chakrabarti; Byron Dom; Prabhakar Raghavan; Sridhar Rajagopalan; David Gibson; Jon M. Kleinberg

We describe the design, prototyping and evaluation of ARC, a system for automatically compiling a list of authoritative Web resources on any (sufficiently broad) topic. The goal of ARC is to compile resource lists similar to those provided by Yahoo! or Infoseek. The fundamental difference is that these services construct lists either manually or through a combination of human and automated effort, while ARC operates fully automatically. We describe the evaluation of ARC, Yahoo!, and Infoseek resource lists by a panel of human users. This evaluation suggests that the resources found by ARC frequently fare almost as well as, and sometimes better than, lists of resources that are manually compiled or classified into a topic. We also provide examples of ARC resource lists for the reader to examine.

foundations of computer science | 2000

Stochastic models for the Web graph

Ravi Kumar; Prabhakar Raghavan; Sridhar Rajagopalan; D. Sivakumar; Andrew Tomkins; Eli Upfal

The Web may be viewed as a directed graph each of whose vertices is a static HTML Web page, and each of whose edges corresponds to a hyperlink from one Web page to another. We propose and analyze random graph models inspired by a series of empirical observations on the Web. Our graph models differ from the traditional G/sub n,p/ models in two ways: 1. Independently chosen edges do not result in the statistics (degree distributions, clique multitudes) observed on the Web. Thus, edges in our model are statistically dependent on each other. 2. Our model introduces new vertices in the graph as time evolves. This captures the fact that the Web is changing with time. Our results are two fold: we show that graphs generated using our model exhibit the statistics observed on the Web graph, and additionally, that natural graph models proposed earlier do not exhibit them. This remains true even when these earlier models are generalized to account for the arrival of vertices over time. In particular, the sparse random graphs in our models exhibit properties that do not arise in far denser random graphs generated by Erdos-Renyi models.

Explore More