Sridhar Rajagopalan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sridhar Rajagopalan is active.

Explore More

Publication

Featured researches published by Sridhar Rajagopalan.

international world wide web conferences | 2000

Graph structure in the Web

Andrei Z. Broder; Ravi Kumar; Farzin Maghoul; Prabhakar Raghavan; Sridhar Rajagopalan; Raymie Stata; Andrew Tomkins; Janet L. Wiener

The study of the web as a graph is not only fascinating in its own right, but also yields valuable insight into web algorithms for crawling, searching and community discovery, and the sociological phenomena which characterize its evolution. We report on experiments on local and global properties of the web graph using two Altavista crawls each with over 200 million pages and 1.5 billion links. Our study indicates that the macroscopic structure of the web is considerably more intricate than suggested by earlier experiments on a smaller scale.

international world wide web conferences | 1999

Trawling the Web for emerging cyber-communities

Ravi Kumar; Prabhakar Raghavan; Sridhar Rajagopalan; Andrew Tomkins

The Web harbors a large number of communities — groups of content-creators sharing a common interest — each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities — those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.

computing and combinatorics conference | 1999

The web as a graph: measurements, models, and methods

Jon M. Kleinberg; Ravi Kumar; Prabhakar Raghavan; Sridhar Rajagopalan; Andrew Tomkins

The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons -- mathematical, sociological, and commercial -- for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web.

international world wide web conferences | 1998

Automatic resource compilation by analyzing hyperlink structure and associated text

Soumen Chakrabarti; Byron Dom; Prabhakar Raghavan; Sridhar Rajagopalan; David Gibson; Jon M. Kleinberg

We describe the design, prototyping and evaluation of ARC, a system for automatically compiling a list of authoritative Web resources on any (sufficiently broad) topic. The goal of ARC is to compile resource lists similar to those provided by Yahoo! or Infoseek. The fundamental difference is that these services construct lists either manually or through a combination of human and automated effort, while ARC operates fully automatically. We describe the evaluation of ARC, Yahoo!, and Infoseek resource lists by a panel of human users. This evaluation suggests that the resources found by ARC frequently fare almost as well as, and sometimes better than, lists of resources that are manually compiled or classified into a topic. We also provide examples of ARC resource lists for the reader to examine.

foundations of computer science | 2000

Stochastic models for the Web graph

Ravi Kumar; Prabhakar Raghavan; Sridhar Rajagopalan; D. Sivakumar; Andrew Tomkins; Eli Upfal

The Web may be viewed as a directed graph each of whose vertices is a static HTML Web page, and each of whose edges corresponds to a hyperlink from one Web page to another. We propose and analyze random graph models inspired by a series of empirical observations on the Web. Our graph models differ from the traditional G/sub n,p/ models in two ways: 1. Independently chosen edges do not result in the statistics (degree distributions, clique multitudes) observed on the Web. Thus, edges in our model are statistically dependent on each other. 2. Our model introduces new vertices in the graph as time evolves. This captures the fact that the Web is changing with time. Our results are two fold: we show that graphs generated using our model exhibit the statistics observed on the Web graph, and additionally, that natural graph models proposed earlier do not exhibit them. This remains true even when these earlier models are generalized to account for the arrival of vertices over time. In particular, the sparse random graphs in our models exhibit properties that do not arise in far denser random graphs generated by Erdos-Renyi models.

international world wide web conferences | 2003

SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

Stephen Dill; Nadav Eiron; David Gibson; Daniel Gruhl; Ramanathan V. Guha; Anant Jhingran; Tapas Kanungo; Sridhar Rajagopalan; Andrew Tomkins; John A. Tomlin; Jason Y. Zien

This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date.We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web.

Journal of Computer and System Sciences | 2001

Recommendation Systems

Ravi Kumar; Prabhakar Raghavan; Sridhar Rajagopalan; Andrew Tomkins

A recommendation system tracks past actions of a group of users to make recommendations to individual members of the group. The growth of computer-mediated marketing and commerce has led to increased interest in such systems. We introduce a simple analytical framework for recommendation systems, including a basis for defining the utility of such a system. We perform probabilistic analyses of algorithms within this framework. These analyses yield insights into how much utility can be derived from knowledge of past user actions.

symposium on principles of database systems | 2000

The Web as a graph

Ravi Kumar; Prabhakar Raghavan; Sridhar Rajagopalan; D. Sivakumar; Andrew Tompkins; Eli Upfal

The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph has about a billion nodes today, several billion links, and appears to grow exponentially with time. There are many reasons—mathematical, sociological, and commercial—for studying the evolution of this graph. We first review a set of algorithms that operate on the Web graph, addressing problems from Web search, automatic community discovery, and classification. We then recall a number of measurements and properties of the Web graph. Noting that traditional random graph models do not explain these observations, we propose a new family of random graph models.

international conference on management of data | 1998

Approximate medians and other quantiles in one pass and with limited memory

Gurmeet Singh Manku; Sridhar Rajagopalan; Bruce G. Lindsay

We present new algorithms for computing approximate quantiles of large datasets in a single pass. The approximation guarantees are explicit, and apply for arbitrary value distributions and arrival distributions of the dataset. The main memory requirements are smaller than those reported earlier by an order of magnitude. We also discuss methods that couple the approximation algorithms with random sampling to further reduce memory requirements. With sampling, the approximation guarantees are explicit but probabilistic, i.e. they apply with respect to a (user controlled) confidence parameter. We present the algorithms, their theoretical analysis and simulation results on different datasets.

international conference on management of data | 1999

Random sampling techniques for space efficient online computation of order statistics of large datasets

Gurmeet Singh Manku; Sridhar Rajagopalan; Bruce G. Lindsay

In a recent paper [MRL98], we had described a general framework for single pass approximate quantile finding algorithms. This framework included several known algorithms as special cases. We had identified a new algorithm, within the framework, which had a significantly smaller requirement for main memory than other known algorithms. In this paper, we address two issues left open in our earlier paper. First, all known and space efficient algorithms for approximate quantile finding require advance knowledge of the length of the input sequence. Many important database applications employing quantiles cannot provide this information. In this paper, we present a novel non-uniform random sampling scheme and an extension of our framework. Together, they form the basis of a new algorithm which computes approximate quantiles without knowing the input sequence length. Second, if the desired quantile is an extreme value (e.g., within the top 1% of the elements), the space requirements of currently known algorithms are overly pessimistic. We provide a simple algorithm which estimates extreme values using less space than required by the earlier more general technique for computing all quantiles. Our principal observation here is that random sampling is quantifiably better when estimating extreme values than is the case with the median.

Explore More