Narayanan Shivakumar | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Narayanan Shivakumar is active.

Explore More

Publication

Featured researches published by Narayanan Shivakumar.

very large data bases | 2000

Computing Geographical Scopes of Web Resources

Junyan Ding; Luis Gravano; Narayanan Shivakumar

Many information resources on the web are relevant primarily to limited geographical communities. For instance, web sites containing information on restaurants, theaters, and apartment rentals are relevant primarily to web users in geographical proximity to these locations. In contrast, other information resources are relevant to a broader geographical community. For instance, an on-line newspaper may be relevant to users across the United States. Unfortunately, current web search engines largely ignore the geographical scope of web resources. In this paper, we introduce techniques for automatically computing the geographical scope of web resources, based on the textual content of the resources, as well as on the geographical distribution of hyperlinks to them. We report an extensive experimental evaluation of our strategies using real web data. Finally, we describe a geographicallyaware search engine that we have built to showcase our techniques.

acm international conference on digital libraries | 1996

Building a scalable and accurate copy detection mechanism

Narayanan Shivakumar; Hector Garcia-Molina

Often, publishers are reluctant to offer valuable digital documentson the Internet for fear that they will be re-transmitted or copiedwidely. A Copy Detection Mechanism can help identify such copying.For example, publishers may register their documents with a copydetection server, and the server can then automatically checkpublic sources such as UseNet articles and Web sites for potentialillegal copies. The server can search for exact copies, and alsofor cases where significant portions of documents have been copied.In this paper we study, for the first time, the performance ofvarious copy detection mechanisms, including the disk storagerequirements, main memory requirements, response times forregistration, and response time for querying. We also contrastperformance to the accuracy of the mechanisms (how well they detectpartial copies). The results are obtained using SCAM, anexperimental server we have implemented, and a collection of 50,000netnews articles.

international conference on management of data | 2000

Finding replicated Web collections

Junghoo Cho; Narayanan Shivakumar; Hector Garcia-Molina

Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times. In this paper, we make the case for identifying replicated documents and collections to improve web crawlers, archivers, and ranking functions used in search engines. The paper describes how to efficiently identify replicated documents and hyperlinked document collections. The challenge is to identify these replicas from an input data set of several tens of millions of web pages and several hundreds of gigabytes of textual data. We also present two real-life case studies where we used replication information to improve a crawler and a search engine. We report these results for a data set of 25 million web pages (about 150 gigabytes of HTML data) crawled from the web.

International Workshop on the World Wide Web and Databases | 1998

Finding Near-Replicas of Documents on the Web

Narayanan Shivakumar; Hector Garcia-Molina

We consider how to efficiently compute the overlap between all pairs of web documents. This information can be used to improve web crawlers, web archivers and in the presentation of search results, among others. We report statistics on how common replication is on the web, and on the cost of computing the above information for a relatively large subset of the web – about 24 million web pages which corresponds to about 150 Gigabytes of textual information.

acm/ieee international conference on mobile computing and networking | 1995

User profile replication for faster location lookup in mobile environments

Narayanan Shivakumar; Jennifer Widom

We consider per-user profile replication as a mechanism for faster location lookup of mobile users in a Personal Communications Service system. We present a minimum-cost maximum-flow based algorithm to compute the set of sites at which a user profile should be replicated given known calling and user mobility patterns. We then present schemes for replication plans that gracefully adapt to changes in the calling and mobility patterns.

Mobile Networks and Applications | 1997

Per-user profile replication in mobile environments: algorithms, analysis, and simulation results

Narayanan Shivakumar; Jan Jannink; Jennifer Widom

We consider per-user profile replication as a mechanism for faster location lookup of mobile users in a personal communications service system. We present a minimum-cost maximum-flow based algorithm to compute the set of sites at which a user profile should be replicated given known calling and user mobility patterns. We show the costs and benefits of our replication algorithm against previous location lookup approaches through analysis. We also simulate our algorithm against other location lookup algorithms on a realistic model of a geographical area to evaluate critical system performance measures. A notable aspect of our simulations is that we use well-validated models of user calling and mobility patterns.

measurement and modeling of computer systems | 2000

Crawler-Friendly Web Servers

Onn Brandman; Junghoo Cho; Hector Garcia-Molina; Narayanan Shivakumar

In this paper we study how to make web servers (e.g., Apache) more crawler friendly. Current web servers offer the same interface to crawlers and regular web surfers, even though crawlers and surfers have very different performance requirements. We evaluate simple and easy-to-incorporate modifications to web servers so that there are significant bandwidth savings. Specifically, we propose that web servers export meta-data archives decribing their content.

international conference on multimedia computing and systems | 1995

The Concord algorithm for synchronization of networked multimedia streams

Narayanan Shivakumar; Cormac J. Sreenan; Balakrishnan Narendran; Prathima Agrawal

Synchronizing different data streams from multiple sources simultaneously at a receiver is one of the basic problems involved in multimedia distributed systems. This requirement stems from the nature of packet based networks which can introduce end-to-end delays that vary both within and across streams. We present a new algorithm called Concord, which provides an integrated solution for these single and multiple stream synchronization problems. It is notable because it defines a single framework to deal with both problems, and operates under the influence of parameters which can be supplied by the application involved. In particular these parameters are used to allow a trade-off between the packet loss rates, total end-to-end delay and skew for each of the streams. For applications like conferencing this is used to reduce delay by determining the minimum buffer delay/size required.

international conference on data engineering | 1998

Safeguarding and charging for information on the Internet

Hector Garcia-Molina; Steven P. Ketchpel; Narayanan Shivakumar

With the growing acceptance of the Internet as a new dissemination medium, several new and interesting challenges arise in building a digital commerce infrastructure. We discuss some of the issues that arise in building such an infrastructure. In particular, we study how one can find and pay for digital information, and how one can safeguard the information from invalid access and duplication. We use examples from our Stanford Digital Library Project to illustrate some of these problems and their potential solutions.

international conference on management of data | 1997

Wave-indices: indexing evolving databases

Narayanan Shivakumar; Hector Garcia-Molina

In many applications, new data is being generated every day. Often an index of the data of a past window of days is required to answer queries efficiently. For example, in a warehouse one may need an index on the sales records of the last week for efficient data mining, or in a Web service one may provide an index of Netnews articles of the past month. In this paper, we propose a variety of wave indices where the data of a new day can be efficiently added, and old data can be quickly expired, to maintain the required window. We compare these schemes based on several system performance measures, such as storage, query response time, and maintenance work, as well as on their simplicity and ease of coding.

Explore More